[In order for any reply to be added to the PR database, ]
[you need to include <[EMAIL PROTECTED]> in the Cc line ]
[and leave the subject line UNCHANGED.  This is not done]
[automatically because of the potential for mail loops. ]
[If you do not include this Cc, your reply may be ig-   ]
[nored unless you are responding to an explicit request ]
[from a developer.                                      ]
[Reply only with text; DO NOT SEND ATTACHMENTS!         ]


Synopsis: Children die. Parent stops serving requests

State-Changed-From-To: feedback-analyzed
State-Changed-By: dgaudet
State-Changed-When: Sat May  1 10:39:02 PDT 1999
State-Changed-Why:
I examined the straces a while ago, but forgot to comment.
Here's a portion of the parent's trace:

time(NULL)                              = 909702870
wait4(-1, 0xbffffe64, WNOHANG, NULL)    = 0
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
time(NULL)                              = 909702871
fork()                                  = 26032
wait4(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 0], WNOHANG, NULL) = 26032
--- SIGCHLD (Child exited) ---
wait4(-1, 0xbffffe64, WNOHANG, NULL)    = -1 ECHILD (No child processes)
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
time(NULL)                              = 909703113

Somehow 242 seconds passed between the two time() calls... the parent does
nothing cpu intensive, so I doubt it's that.  It's possible the guy's box
is swapping to hell... but we've got about a dozen similar reports.  The
reports are against 2.0.30, 2.0.32, and 2.0.33. 

Oh then there's the odd SIGCHLD followed by ECHILD... there's a few other
instances of that -- SIGCHLDs happenning and wait4() not reporting
anything. 

The short answer:  kernel problem.  Alan Cox hasn't heard of
this problem before, so it's probably an unknown problem.

Dean

Reply via email to