Greg Ames wrote:
> 
> Justin Erenkrantz wrote:
> >
> > On Mon, Jan 28, 2002 at 05:44:38PM -0500, Greg Ames wrote:
> > > > I'll leave it alone for an hour or two and then restart it unless
> > > > someone volunteers to investigate this.
> > >
> > > I just bounced us back to 2_0_28.  Thanks for pointing this out, Manoj.
> >
> > What we were we on?  HEAD?
> 
> 2.0.29
> > Anything in the logs?
> 
> see http://www.apache.org/~gregames/errorlog.28Jan
> 
> Tons of "normal" errors.  The parent was alive at midnite, because the first
> thing in the log is the msg you get after a graceful restart.  I don't see the
> strings "nasty" (from the signal handler for sigsegv etc) or "fatal".

looking a little closer, "Fatal" is indeed in the log:

[Mon Jan 28 06:06:32 2002] [error] (23)Too many open files in system:
apr_accept: (client socket)
[Mon Jan 28 06:06:32 2002] [error] (23)Too many open files in system:
apr_accept: (client socket)
[Mon Jan 28 06:06:32 2002] [error] (23)Too many open files in system:
apr_accept: (client socket)
[Mon Jan 28 06:06:32 2002] [error] (23)Too many open files in system:
apr_accept: (client socket)
[Mon Jan 28 06:06:32 2002] [error] (23)Too many open files in system:
apr_accept: (client socket)

[there's bunches of these]

[Mon Jan 28 06:06:32 2002] [alert] Child 50368 returned a Fatal error...
Apache is exiting!

...so httpd 2.0.29's parent was a victim of the system fd shortage Brian B
pointed out.  Maybe the cause too, but I doubt it.  I will look closer and see
if we can figure out who is eating the fds.  Justin had some ideas for
troubleshooting that might help.

prefork's error paths for apr_accept cause it to bail with a fatal error in this
case.  Is that the Right Thing for this error?  Another possibility is to let
individual children die without killing the parent.  It might free up enough
system fd's to let us limp along until the spike goes away, then recover fully. 

Greg

Reply via email to