On Mon, Jan 26, 2004 at 07:37:23PM +0000, Colm MacCarthaigh wrote: > On Mon, Jan 26, 2004 at 06:28:03PM +0000, Colm MacCarthaigh wrote: > > > I'd love to find out what's causing your worker failures. Are you using > > > any thread-unsafe modules or libraries? > > > > Not to my knowledge, I wasn't planning to do this till later, but > > I've bumped to 2.1, I'll try out the forensic_id and backtrace > > modules right now, and see how that goes. > > *sigh*, forensic_id didn't catch it, backtrace didn't catch it, > whatkilledus didn't catch it, all tried individually. The parent just > dumps core; the children live on, serve their content and log their > request and then drop off one by one. No uncomplete requests, > no backtrace or other exception info thrown into any log. > > corefile is as useful as ever, unbacktracable. suggestions welcome!
Have you tried setting up a signal handler for SIGSEGV and calling kill(getpid(), SIGSTOP); in the signal handler? After attaching to the process with gdb, send a CONT signal to the process from another terminal. It's worth a shot. (Is the process dying from SIGSEGV or some other signal? Does the core file tell you?) Can you get a tcpdump of the traffic leading up to the crash? (Yeah I know it would be a lot) If you can get a tcpdump, and then can replay the traffic and reproduce it, more of us can look at this. Cheers, Glenn