> -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Ernie Coskrey > Sent: Tuesday, July 31, 2007 3:40 PM > To: cygwin@cygwin.com > Subject: cygwin 1.5.20-1, spinning pdksh, 100% CPU > > > I've run into a problem with cygwin 1.5.20-1 and pdksh > 5.2.14. We've got a pdksh.exe process that is spinning, > using all the CPU. > > This scenario is very hard to reproduce, but has happened on > our test systems occasionally. It occurred recently, and I > currently have gdb attached to the process and have the > symbols loaded. I see that pdksh is continually calling > "sigsuspend()", which is immediately returning from > cancelable_wait due to the fact that the signal_arrived event > is set. I also see that pdksh is waiting for a subprocess to > complete, and has a handle to the PID of that process - > however the process has long since terminated. > > It appears that something went wrong during delivery of SIGCHLD. > > I've got two questions related to this: > > - have there been changes between 1.5.20-1 and 1.5.24-2, or > the latest snapshot, that might have fixed this issue? We've > done some limited testing with 1.5.24-2 and haven't seen this > happen yet, but as I said the it only happens rarely. > - is there anything I can look at in gdb to help identify > what the issue is? > > Any suggestions would be appreciated! > > --------- > Ernie Coskrey
I've discovered an interesting piece of information that I think is related to this. I'm hoping this might ring a bell with someone on the list. Looking at _main_tls->stack[], when I've set a breakpoint in handle_sigsuspend just after the cancelable_wait() call, I see the following entries: 0x6109186f 0x4132ac 0x6109186f is "sigdelayed()", which is the routine that should have been called to deliver the signal and reset the signal_arrived event. 0x4132ac is j_waitj (in pdksh). So, somehow, when this problem occurs, "sigdelayed" gets pushed onto the stack *before* j_waitj does. So, _sigbe never calls sigdelayed. I don't think there's ever a case where sigdelayed should be at _main_tls->stack[0]. However this happened is, I believe, the cause of this problem. Ernie Coskrey -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/