[In order for any reply to be added to the PR database, ] [you need to include <[EMAIL PROTECTED]> in the Cc line ] [and leave the subject line UNCHANGED. This is not done] [automatically because of the potential for mail loops. ] [If you do not include this Cc, your reply may be ig- ] [nored unless you are responding to an explicit request ] [from a developer. ] [Reply only with text; DO NOT SEND ATTACHMENTS! ]
Synopsis: Children die. Parent stops serving requests State-Changed-From-To: feedback-analyzed State-Changed-By: dgaudet State-Changed-When: Sat May 1 10:39:02 PDT 1999 State-Changed-Why: I examined the straces a while ago, but forgot to comment. Here's a portion of the parent's trace: time(NULL) = 909702870 wait4(-1, 0xbffffe64, WNOHANG, NULL) = 0 select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) time(NULL) = 909702871 fork() = 26032 wait4(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 0], WNOHANG, NULL) = 26032 --- SIGCHLD (Child exited) --- wait4(-1, 0xbffffe64, WNOHANG, NULL) = -1 ECHILD (No child processes) select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) time(NULL) = 909703113 Somehow 242 seconds passed between the two time() calls... the parent does nothing cpu intensive, so I doubt it's that. It's possible the guy's box is swapping to hell... but we've got about a dozen similar reports. The reports are against 2.0.30, 2.0.32, and 2.0.33. Oh then there's the odd SIGCHLD followed by ECHILD... there's a few other instances of that -- SIGCHLDs happenning and wait4() not reporting anything. The short answer: kernel problem. Alan Cox hasn't heard of this problem before, so it's probably an unknown problem. Dean