[In order for any reply to be added to the PR database, ]
[you need to include <[EMAIL PROTECTED]> in the Cc line ]
[and leave the subject line UNCHANGED. This is not done]
[automatically because of the potential for mail loops. ]
[If you do not include this Cc, your reply may be ig- ]
[nored unless you are responding to an explicit request ]
[from a developer. ]
[Reply only with text; DO NOT SEND ATTACHMENTS! ]
Synopsis: Children die. Parent stops serving requests
State-Changed-From-To: feedback-analyzed
State-Changed-By: dgaudet
State-Changed-When: Sat May 1 10:39:02 PDT 1999
State-Changed-Why:
I examined the straces a while ago, but forgot to comment.
Here's a portion of the parent's trace:
time(NULL) = 909702870
wait4(-1, 0xbffffe64, WNOHANG, NULL) = 0
select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
time(NULL) = 909702871
fork() = 26032
wait4(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 0], WNOHANG, NULL) = 26032
--- SIGCHLD (Child exited) ---
wait4(-1, 0xbffffe64, WNOHANG, NULL) = -1 ECHILD (No child processes)
select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout)
time(NULL) = 909703113
Somehow 242 seconds passed between the two time() calls... the parent does
nothing cpu intensive, so I doubt it's that. It's possible the guy's box
is swapping to hell... but we've got about a dozen similar reports. The
reports are against 2.0.30, 2.0.32, and 2.0.33.
Oh then there's the odd SIGCHLD followed by ECHILD... there's a few other
instances of that -- SIGCHLDs happenning and wait4() not reporting
anything.
The short answer: kernel problem. Alan Cox hasn't heard of
this problem before, so it's probably an unknown problem.
Dean