Re: PR42829: graceful restart with multiple listeners using prefork MPM can result in hung processes

2009-01-06 Thread Jeff Trawick
On Tue, Feb 5, 2008 at 7:53 AM, Joe Orton  wrote:

> On Fri, Feb 01, 2008 at 10:41:39AM +0100, Stefan Fritsch wrote:
> > Joe Orton wrote:
> > > I mentioned in the bug that the signal handler could cause undefined
> > > behaviour, but I'm not sure now whether that is true.  On Linux I can
> > > reproduce some cases where this will happen, which are all due to
> > > well-defined behaviour:
> > >
> > > 1) with some (default on Linux) accept mutex types,
> > > apr_proc_mutex_lock() will loop on EINTR.  Hence, children blocked
> > > waiting for the mutex do "hang" until the mutex is released.  Fixing
> > > this would need some APR work, new interfaces, blah
> >
> > This is not a problem. On graceful-stop or reload the processes will get
> > the lock one by one and die (or hang somewhere else). I have never seen a
> > left over process hanging in this function.
>
> Well, normally all children will be woken up and take the accept mutex
> because of the dummy connections.  But if you have one child blocked
> because of issue (3) - whilst holding the accept mutex - all the other
> children will also be blocked.  If the EINTR could be processed at MPM
> level, this wouldn't happen.  So I think it is a problem, though you
> could argue that solving (3) also sort of solves (1).
>
> > > I can also reproduce a third case, but I'm not sure about the cause:
> > >
> > > 3) apr_pollset_poll() is blocking despite the fact that the listening
> > > fds are supposedly already closed before entering the syscall.
> >
> > This is the main problem in my experience.
> ...
> > On Linux with epoll, the hanging processes just blocks in
> > apr_pollset_poll(), so checking the return value won't do any good.
> >
> > Maybe the problem is that (AIUI) poll() returns POLLNVAL if a fd is not
> > open, while epoll() does not have something similar. In epoll.c, a
> comment
> > says "APR_POLLNVAL is not handled by epoll". Or should epoll return
> > EPOLLHUP in this case?
>
> I did some more research on this: the case is covered in the epoll(7)
> man page - fds are removed from any containing epoll sets on closure.
> So it is well-defined behaviour, and the "hang" is expected; when all
> the listeners are closed, the poll set becomes empty, so the
> apr_pollset_poll() call will sleep forever, or until interrupted by
> signal!
>
> select() and poll() will indeed return POLLNVAL for the closed-fds case,
> and prefork needs to check for that.
>
> From some brief googling, FreeBSD kqueue appears to have the same
> guarantee.  This PR has some investigation of what happens with Solaris
> ports: http://issues.apache.org/bugzilla/show_bug.cgi?id=42580
>
> For the graceful-stop case, it would be simple enough to just signal any
> dozy children again to wake them up in the wait-for-exit loop, but
> graceful-restart doesn't have that opportunity, so I'm not sure about a
> general solution.  Reducing the poll timeout to some non-infinite time
> would work.


This holds up to some very light graceful-restart testing on OpenSolaris
(the same light testing that triggered a hang):

Index: server/mpm/prefork/prefork.c
===
--- server/mpm/prefork/prefork.c(revision 731724)
+++ server/mpm/prefork/prefork.c(working copy)
@@ -540,10 +540,12 @@
 apr_int32_t numdesc;
 const apr_pollfd_t *pdesc;

-/* timeout == -1 == wait forever */
-status = apr_pollset_poll(pollset, -1, &numdesc, &pdesc);
+/* timeout == 10 seconds to avoid a hang at graceful
restart/stop
+ * caused by the closing of sockets by the signal handler
+ */
+status = apr_pollset_poll(pollset, apr_time_from_sec(10),
&numdesc, &pdesc);
 if (status != APR_SUCCESS) {
-if (APR_STATUS_IS_EINTR(status)) {
+if (APR_STATUS_IS_TIMEUP(status) ||
APR_STATUS_IS_EINTR(status)) {
 if (one_process && shutdown_pending) {
 return;
 }


Re: PR42829

2008-05-30 Thread Paul Querna

Stefan Fritsch wrote:
My mail in January already mentioned that the patch is in Debian, but 
I guess now after the openssl debacle people are more sensitive. If 
you think it would help, I could go through our patches and post a 
list of the non-Debian specific ones here.


I think that would be helpful and a good thing,

Thanks,

-Paul


Re: PR42829

2008-05-30 Thread Stefan Fritsch
On Friday 30 May 2008, Nick Kew wrote:
> I don't think I share your implied view about how grave this is.

I guess this is the main (or only?) problem with this patch/bug. I got 
quite a few people complaining about it and therefore I wanted to fix 
it.

> I respect your opinion, but when maintaining your own patches,
> please consider also the problems discussed in my article at
> http://www.regdeveloper.co.uk/2006/11/04/apache_packages_support_va
>cuum/ (which goes to the heart of why Debian may get a pretty
> hostile reception amongst some Apache folks).

Yes, this is definitely a problem, but not easy to fix. I hope I will 
find some time soon to try to improve the situation. In any case the 
problem is less about patches but more about the configuration and 
the additional scripts we ship with apache. For example the 
configuration is split into many small files because this makes 
upgrades easier because of the way dpkg handles config files. 

> > To take it to the extreme, a fork being called 'Apache' isn't
> > acceptable either.  Please work with us here, even though it's a
> > very low barrier for you to put patches in your package, much
> > lower than to get it applied upstream (here).

Fixing bugs is not forking. We don't include many patches that are not 
either bug fixes or related to build or file system layout issues.
For example we don't add features or change the behaviour (unless the 
component comes in a separate package that is clearly marked as 
non-standard, like the mpm-itk). And for the bug fixes, these are 
usually from branches/2.2.x or from the Apache bugzilla.

> To be fair, I think Stefan _is_ working with us: he's put his patch
> in bugzilla, and (now, though not originally) he's raised it on
> the list.

I raised the issue in January 
(http://marc.info/?l=apache-httpd-dev&m=119945416529706&w=2) and 
there was some discussion with Joe Orton, but no conclusion about 
what would be the proper fix. But since I had a fix that worked for 
me, I didn't see any reason to revert the patch.

My mail in January already mentioned that the patch is in Debian, but 
I guess now after the openssl debacle people are more sensitive. If 
you think it would help, I could go through our patches and post a 
list of the non-Debian specific ones here.

Cheers,
Stefan


Re: PR42829

2008-05-30 Thread Joe Orton
On Thu, May 29, 2008 at 03:34:21PM -0700, Paul Querna wrote:
> Stefan Fritsch wrote:
>> https://issues.apache.org/bugzilla/attachment.cgi?id=21137 has been in 
>> Debian testing and unstable for about 6 months without problems. It is not 
>> an elegant solution but it works. Considering that is is not clear how an 
>> elegant solution would look like, including this patch might make sense.
>
> Please don't put these kind of patches into the debian apache packages, 
> especially ones that don't exist in trunk.

Why not, is this some kind of distros-can't-be-trusted-to-run-patch 
reaction after the Debian OpenSSL PRNG incident?  Blah.  

1) we-the-committers failed to do anything useful about this bug
2) Stefan has a patch 
3) having more test data from people running that patch is useful

If you don't want people *patching* the *Apache source code* (god 
forbid!), then we need to go pick another license.

joe


Re: PR42829

2008-05-30 Thread Jess Holle

Eric Covener wrote:

On Fri, May 30, 2008 at 9:07 AM, Jess Holle <[EMAIL PROTECTED]> wrote

util_ldap*.c: still changing '#if APR_HAS_SHARED_MEMORY' to '#if 0' as last
we checked the shared memory stuff was still unstable with the worker MPM --
at least on Solaris and AIX


This may be addressed by two changes that made it into 2.2.8, you may
want to revisit (latter was diagnosed on aix w/ worker)

  *) mod_ldap: Give callers a reference to data copied into the request
 pool instead of references directly into the cache
 PR 43786 [Eric Covener]

  *) mod_ldap: Stop passing a reference to pconf around for
 (limited) use during request processing, avoiding possible
 memory corruption and crashes.  [Eric Covener]
  

Thanks for the heads up.  I'm not sure if we retested in 2.2.8.

--
Jess Holle



Re: PR42829

2008-05-30 Thread Eric Covener
On Fri, May 30, 2008 at 9:07 AM, Jess Holle <[EMAIL PROTECTED]> wrote:

> util_ldap*.c: still changing '#if APR_HAS_SHARED_MEMORY' to '#if 0' as last
> we checked the shared memory stuff was still unstable with the worker MPM --
> at least on Solaris and AIX

This may be addressed by two changes that made it into 2.2.8, you may
want to revisit (latter was diagnosed on aix w/ worker)

  *) mod_ldap: Give callers a reference to data copied into the request
 pool instead of references directly into the cache
 PR 43786 [Eric Covener]

  *) mod_ldap: Stop passing a reference to pconf around for
 (limited) use during request processing, avoiding possible
 memory corruption and crashes.  [Eric Covener]


-- 
Eric Covener
[EMAIL PROTECTED]


Re: PR42829

2008-05-30 Thread Jess Holle

Nick Kew wrote:

As for maintaining local patches, he's not the only one doing that,
and our license clearly allows it.  Licenses that restrict such
things seem to be widely disliked: c.f. DJB/qmail.
  
We've made a concerted effort to supply all patches back, yet we always 
find that we maintain a few local patches.  We don't want to, but there 
are various bits that we just never successfully pushed back for one 
reason or another, e.g.:


   * mod_authn_alias.dep/.dsp/.mak: changes for building on Windows
 o Not sure why [as I'm no longer doing these builds myself]
   could be to allow us to build with an older MS studio
   * mod_deflate.c: added support for a response header which will
 allow responses (e.g. from Tomcat) to dynamically opt out of
 compression
 o code was suggested on the Apache lists, but uninteresting to
   Apache trunk apparently
   * util_ldap*.c: still changing '#if APR_HAS_SHARED_MEMORY' to '#if
 0' as last we checked the shared memory stuff was still unstable
 with the worker MPM -- at least on Solaris and AIX

--
Jess Holle



Re: PR42829

2008-05-30 Thread Nick Kew
On Fri, 30 May 2008 08:29:32 +0200
"Sander Striker" <[EMAIL PROTECTED]> wrote:

> > Bugs as grave as this one are not acceptable in Debian packages for
> > extended periods of time. The bug report has been open for over 1
> > year, I have attached my patch on 2007-11-16. It is marked as
> > critical since 2008-01-16. If you don't want such patches in the
> > Debian package, you need to fix such bugs faster (and comment on
> > patches in bugzilla faster). Of course I understand that this is
> > difficult because there are never enough people to fix bugs (we have
> > the same problem).

I don't think I share your implied view about how grave this is.
I respect your opinion, but when maintaining your own patches,
please consider also the problems discussed in my article at
http://www.regdeveloper.co.uk/2006/11/04/apache_packages_support_vacuum/
(which goes to the heart of why Debian may get a pretty hostile 
reception amongst some Apache folks).

> To take it to the extreme, a fork being called 'Apache' isn't
> acceptable either.  Please work with us here, even though it's a very
> low barrier for you to put patches in your package, much lower than to
> get it applied upstream (here).

To be fair, I think Stefan _is_ working with us: he's put his patch
in bugzilla, and (now, though not originally) he's raised it on
the list.

As for maintaining local patches, he's not the only one doing that,
and our license clearly allows it.  Licenses that restrict such
things seem to be widely disliked: c.f. DJB/qmail.

-- 
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/


Re: PR42829

2008-05-29 Thread Paul Querna

Stefan Fritsch wrote:
Bugs as grave as this one are not acceptable in Debian packages for 
extended periods of time. 


Then change your default webserver to lighttpd.

I'm sure its bug free.

HTH.

-Paul


Re: PR42829

2008-05-29 Thread Sander Striker
On Fri, May 30, 2008 at 8:03 AM, Stefan Fritsch <[EMAIL PROTECTED]> wrote:
> On Friday 30 May 2008, Paul Querna wrote:
>> > https://issues.apache.org/bugzilla/attachment.cgi?id=21137 has
>> > been in Debian testing and unstable for about 6 months without
>> > problems. It is not an elegant solution but it works. Considering
>> > that is is not clear how an elegant solution would look like,
>> > including this patch might make sense.
>>
>> Please don't put these kind of patches into the debian apache
>> packages, especially ones that don't exist in trunk.
>>
>> (Things that are committed to turnk, and just are awaiting
>> backport, I'm less concerned about, but this patch is a behavior
>> change at the core of the MPMs.)
>
> Bugs as grave as this one are not acceptable in Debian packages for
> extended periods of time. The bug report has been open for over 1
> year, I have attached my patch on 2007-11-16. It is marked as
> critical since 2008-01-16. If you don't want such patches in the
> Debian package, you need to fix such bugs faster (and comment on
> patches in bugzilla faster). Of course I understand that this is
> difficult because there are never enough people to fix bugs (we have
> the same problem).

To take it to the extreme, a fork being called 'Apache' isn't
acceptable either.  Please work with us here, even though it's a very
low barrier for you to put patches in your package, much lower than to
get it applied upstream (here).

> I admit that I should have followed up on the discussion in February,
> but I was quite busy and then forgot about it.

Cheers,

Sander


Re: PR42829

2008-05-29 Thread Stefan Fritsch
On Friday 30 May 2008, Paul Querna wrote:
> > https://issues.apache.org/bugzilla/attachment.cgi?id=21137 has
> > been in Debian testing and unstable for about 6 months without
> > problems. It is not an elegant solution but it works. Considering
> > that is is not clear how an elegant solution would look like,
> > including this patch might make sense.
>
> Please don't put these kind of patches into the debian apache
> packages, especially ones that don't exist in trunk.
>
> (Things that are committed to turnk, and just are awaiting
> backport, I'm less concerned about, but this patch is a behavior
> change at the core of the MPMs.)

Bugs as grave as this one are not acceptable in Debian packages for 
extended periods of time. The bug report has been open for over 1 
year, I have attached my patch on 2007-11-16. It is marked as 
critical since 2008-01-16. If you don't want such patches in the 
Debian package, you need to fix such bugs faster (and comment on 
patches in bugzilla faster). Of course I understand that this is 
difficult because there are never enough people to fix bugs (we have 
the same problem).

I admit that I should have followed up on the discussion in February, 
but I was quite busy and then forgot about it.

Cheers,
Stefan


Re: PR42829

2008-05-29 Thread Paul Querna

Stefan Fritsch wrote:

On Thursday 29 May 2008, Jim Jagielski wrote:

for 2.2.9, it would be nice to fix the epoll issue PR 42829,
IMHO. The patch in the bug report works, even if it may not be
the perfect solution.

 From what I can see, there is no real patch available or fully
tested enough to warrant anything for 2.2.9 right now.



https://issues.apache.org/bugzilla/attachment.cgi?id=21137 has been in 
Debian testing and unstable for about 6 months without problems. It 
is not an elegant solution but it works. Considering that is is not 
clear how an elegant solution would look like, including this patch 
might make sense.


Please don't put these kind of patches into the debian apache packages, 
especially ones that don't exist in trunk.


(Things that are committed to turnk, and just are awaiting backport, I'm 
less concerned about, but this patch is a behavior change at the core of 
the MPMs.)


-Paul



Re: PR42829 (was: 2.2.9 status)

2008-05-29 Thread Stefan Fritsch
On Thursday 29 May 2008, Jim Jagielski wrote:
> > https://issues.apache.org/bugzilla/attachment.cgi?id=21137 has
> > been in Debian testing and unstable for about 6 months without
> > problems. It is not an elegant solution but it works. Considering
> > that is is not clear how an elegant solution would look like,
> > including this patch might make sense.
>
> Even if so, we cannot simply put it in 2.2.9. It needs to
> be in trunk first, then tested, then proposed for backport
> to 2.2.x and then voted on there before backported. Timing-wise,
> it is VERY unlikely this will happen in time for 2.2.9. However,
> some other prefork fixes I just added to STATUS in hopes of adding
> them to 2.2.9...

I will bug you again after 2.2.9, then.


Re: PR42829 (was: 2.2.9 status)

2008-05-29 Thread Jim Jagielski


On May 29, 2008, at 4:46 PM, Stefan Fritsch wrote:


On Thursday 29 May 2008, Jim Jagielski wrote:

for 2.2.9, it would be nice to fix the epoll issue PR 42829,
IMHO. The patch in the bug report works, even if it may not be
the perfect solution.


From what I can see, there is no real patch available or fully
tested enough to warrant anything for 2.2.9 right now.



https://issues.apache.org/bugzilla/attachment.cgi?id=21137 has been in
Debian testing and unstable for about 6 months without problems. It
is not an elegant solution but it works. Considering that is is not
clear how an elegant solution would look like, including this patch
might make sense.



Even if so, we cannot simply put it in 2.2.9. It needs to
be in trunk first, then tested, then proposed for backport
to 2.2.x and then voted on there before backported. Timing-wise,
it is VERY unlikely this will happen in time for 2.2.9. However,
some other prefork fixes I just added to STATUS in hopes of adding
them to 2.2.9...


PR42829 (was: 2.2.9 status)

2008-05-29 Thread Stefan Fritsch
On Thursday 29 May 2008, Jim Jagielski wrote:
> > for 2.2.9, it would be nice to fix the epoll issue PR 42829,
> > IMHO. The patch in the bug report works, even if it may not be
> > the perfect solution.
>
>  From what I can see, there is no real patch available or fully
> tested enough to warrant anything for 2.2.9 right now.


https://issues.apache.org/bugzilla/attachment.cgi?id=21137 has been in 
Debian testing and unstable for about 6 months without problems. It 
is not an elegant solution but it works. Considering that is is not 
clear how an elegant solution would look like, including this patch 
might make sense.


Re: PR42829: graceful restart with multiple listeners using prefork MPM can result in hung processes

2008-02-05 Thread Joe Orton
On Fri, Feb 01, 2008 at 10:41:39AM +0100, Stefan Fritsch wrote:
> Joe Orton wrote:
> > I mentioned in the bug that the signal handler could cause undefined
> > behaviour, but I'm not sure now whether that is true.  On Linux I can
> > reproduce some cases where this will happen, which are all due to
> > well-defined behaviour:
> >
> > 1) with some (default on Linux) accept mutex types,
> > apr_proc_mutex_lock() will loop on EINTR.  Hence, children blocked
> > waiting for the mutex do "hang" until the mutex is released.  Fixing
> > this would need some APR work, new interfaces, blah
> 
> This is not a problem. On graceful-stop or reload the processes will get
> the lock one by one and die (or hang somewhere else). I have never seen a
> left over process hanging in this function.

Well, normally all children will be woken up and take the accept mutex 
because of the dummy connections.  But if you have one child blocked 
because of issue (3) - whilst holding the accept mutex - all the other 
children will also be blocked.  If the EINTR could be processed at MPM 
level, this wouldn't happen.  So I think it is a problem, though you 
could argue that solving (3) also sort of solves (1).

> > I can also reproduce a third case, but I'm not sure about the cause:
> >
> > 3) apr_pollset_poll() is blocking despite the fact that the listening
> > fds are supposedly already closed before entering the syscall.
> 
> This is the main problem in my experience.
...
> On Linux with epoll, the hanging processes just blocks in
> apr_pollset_poll(), so checking the return value won't do any good.
> 
> Maybe the problem is that (AIUI) poll() returns POLLNVAL if a fd is not
> open, while epoll() does not have something similar. In epoll.c, a comment
> says "APR_POLLNVAL is not handled by epoll". Or should epoll return
> EPOLLHUP in this case?

I did some more research on this: the case is covered in the epoll(7) 
man page - fds are removed from any containing epoll sets on closure.  
So it is well-defined behaviour, and the "hang" is expected; when all 
the listeners are closed, the poll set becomes empty, so the 
apr_pollset_poll() call will sleep forever, or until interrupted by 
signal!

select() and poll() will indeed return POLLNVAL for the closed-fds case, 
and prefork needs to check for that.

>From some brief googling, FreeBSD kqueue appears to have the same 
guarantee.  This PR has some investigation of what happens with Solaris 
ports: http://issues.apache.org/bugzilla/show_bug.cgi?id=42580

For the graceful-stop case, it would be simple enough to just signal any 
dozy children again to wake them up in the wait-for-exit loop, but 
graceful-restart doesn't have that opportunity, so I'm not sure about a 
general solution.  Reducing the poll timeout to some non-infinite time 
would work.

joe


Re: PR42829: graceful restart with multiple listeners using prefork MPM can result in hung processes

2008-02-01 Thread Stefan Fritsch
Joe Orton wrote:
> I mentioned in the bug that the signal handler could cause undefined
> behaviour, but I'm not sure now whether that is true.  On Linux I can
> reproduce some cases where this will happen, which are all due to
> well-defined behaviour:
>
> 1) with some (default on Linux) accept mutex types,
> apr_proc_mutex_lock() will loop on EINTR.  Hence, children blocked
> waiting for the mutex do "hang" until the mutex is released.  Fixing
> this would need some APR work, new interfaces, blah

This is not a problem. On graceful-stop or reload the processes will get
the lock one by one and die (or hang somewhere else). I have never seen a
left over process hanging in this function.

> 2) prefork's apr_pollset_poll() loop-on-EINTR loop was not checking
> die_now; the child holding the mutex will not die immediately if poll
> fails with EINTR, and will hence appear to "hang" until a new connection
> is recevied.  Fixed by http://svn.apache.org/viewvc?rev=613260&view=rev

IMHO this is the same as 3), as apr_pollset_poll() will be called again
but with all fds already closed.

> I can also reproduce a third case, but I'm not sure about the cause:
>
> 3) apr_pollset_poll() is blocking despite the fact that the listening
> fds are supposedly already closed before entering the syscall.

This is the main problem in my experience.

> I vaguely recall some issue with epoll being mentioned before in the
> context of graceful stop, but I can't find a reference.  Colm?
>
> A very tempting explanation for (3) would be the fact that prefork only
> polls for POLLIN events, not POLLHUP or POLLERR, or indeed that it does
> not check that the returned event really is a POLLIN event; POSIX says
> on poll:
>
> " ... poll() shall set the POLLHUP, POLLERR, and POLLNVAL flag in
>  revents if the condition is true, even if the application did not set
>  the corresponding bit in events."
>

I also had problems under solaris 9 where processes blocked in 
lr->accept_func() if the fd had been closed in the meantime. 
Unfortunately, I cannot reproduce it now even with an unpatched 2.2.6 and
I don't remember which configuration I used. But this could be related to
the returned event not being POLLIN.

> and there's even a comment in the prefork poll code to the effect that
> maybe checking the returned event type would be a good idea.  But from a
> brief play around here, fixing the poll code to DTRT doesn't help.  I
> think more investigation is needed to understand exactly what is going
> on here.
>
> (Also, just to note; I can reproduce (3) even with my patch to dup2
> against the listener fds.)

On Linux with epoll, the hanging processes just blocks in
apr_pollset_poll(), so checking the return value won't do any good.

Maybe the problem is that (AIUI) poll() returns POLLNVAL if a fd is not
open, while epoll() does not have something similar. In epoll.c, a comment
says "APR_POLLNVAL is not handled by epoll". Or should epoll return
EPOLLHUP in this case?

Stefan



Re: PR42829: graceful restart with multiple listeners using prefork MPM can result in hung processes

2008-01-18 Thread Joe Orton
On Fri, Jan 04, 2008 at 02:42:05PM +0100, Stefan Fritsch wrote:
> this bug can be quite annoying because of the resources used by the hung
> processes. It happens e.g. under Linux when epoll is used.
> 
> The patch from http://issues.apache.org/bugzilla/show_bug.cgi?id=42829#c14
> has been in Debian unstable/Ubuntu hardy for several weeks and there have
> not been any complaints.

I've been looking into this in more detail; excuse the length of this 
mail.  The symptom in question is described as "children hang after 
graceful restart/stop in 2.2.x".

I mentioned in the bug that the signal handler could cause undefined 
behaviour, but I'm not sure now whether that is true.  On Linux I can 
reproduce some cases where this will happen, which are all due to 
well-defined behaviour:

1) with some (default on Linux) accept mutex types, 
apr_proc_mutex_lock() will loop on EINTR.  Hence, children blocked 
waiting for the mutex do "hang" until the mutex is released.  Fixing 
this would need some APR work, new interfaces, blah

2) prefork's apr_pollset_poll() loop-on-EINTR loop was not checking 
die_now; the child holding the mutex will not die immediately if poll 
fails with EINTR, and will hence appear to "hang" until a new connection 
is recevied.  Fixed by http://svn.apache.org/viewvc?rev=613260&view=rev

I can also reproduce a third case, but I'm not sure about the cause:

3) apr_pollset_poll() is blocking despite the fact that the listening 
fds are supposedly already closed before entering the syscall.

I vaguely recall some issue with epoll being mentioned before in the 
context of graceful stop, but I can't find a reference.  Colm?

A very tempting explanation for (3) would be the fact that prefork only 
polls for POLLIN events, not POLLHUP or POLLERR, or indeed that it does 
not check that the returned event really is a POLLIN event; POSIX says 
on poll:

" ... poll() shall set the POLLHUP, POLLERR, and POLLNVAL flag in
 revents if the condition is true, even if the application did not set
 the corresponding bit in events."

and there's even a comment in the prefork poll code to the effect that 
maybe checking the returned event type would be a good idea.  But from a 
brief play around here, fixing the poll code to DTRT doesn't help.  I 
think more investigation is needed to understand exactly what is going 
on here.

(Also, just to note; I can reproduce (3) even with my patch to dup2 
against the listener fds.)

joe


Re: PR42829: graceful restart with multiple listeners using prefork MPM can result in hung processes

2008-01-18 Thread Martin Kraemer
On Fri, Jan 04, 2008 at 02:42:05PM +0100, Stefan Fritsch wrote:
> Hi,
> 
> this bug can be quite annoying because of the resources used by the hung
> processes. It happens e.g. under Linux when epoll is used.
> 
> The patch from http://issues.apache.org/bugzilla/show_bug.cgi?id=42829#c14
> has been in Debian unstable/Ubuntu hardy for several weeks and there have
> not been any complaints.
> 
> It would be nice if you could look at it and commit it to svn.

I can confirm that there are problems with the restart at least on
FreeBSD-4.x/prefork.

On FreeBSD-4.x/prefork I see this after a graceful restart:
--snip--
$ apachectl status

  Apache Server Status for localhost

   Server Version: Apache/2.3.0-dev (Unix) mod_ssl/2.3.0-dev
  OpenSSL/0.9.7d-p1 DAV/2

   Server Built: Jan 16 2008 04:19:11
[..]
   CPU Usage: u4.45313 s4.3125 cu0 cs0 - .00454% CPU load
   .0265 requests/sec - 9 B/second - 372 B/request
   10 requests currently being processed, 7 idle workers

GG_G__GGW...

[...]
--snip--

After another graceful restart, I see
GGGWG...
and the 'G' processes are stuck at state 'G'.

With the patch applied, I no longer see any of the hanging
"gracefully stuck" processes.

So, from my side, I'd +1 the patch (although I understand the intention
of the code, I have not "brain-traced" all code paths, so this is not
a final "code +1" but just a "appears to fix the problem +1").

Anyone else?

   Martin
-- 
<[EMAIL PROTECTED]>| Fujitsu Siemens
http://www.fujitsu-siemens.com/imprint.html | 81730  Munich,  Germany


PR42829: graceful restart with multiple listeners using prefork MPM can result in hung processes

2008-01-04 Thread Stefan Fritsch
Hi,

this bug can be quite annoying because of the resources used by the hung
processes. It happens e.g. under Linux when epoll is used.

The patch from http://issues.apache.org/bugzilla/show_bug.cgi?id=42829#c14
has been in Debian unstable/Ubuntu hardy for several weeks and there have
not been any complaints.

It would be nice if you could look at it and commit it to svn.

Thanks,
Stefan