Joe Orton wrote: > I mentioned in the bug that the signal handler could cause undefined > behaviour, but I'm not sure now whether that is true. On Linux I can > reproduce some cases where this will happen, which are all due to > well-defined behaviour: > > 1) with some (default on Linux) accept mutex types, > apr_proc_mutex_lock() will loop on EINTR. Hence, children blocked > waiting for the mutex do "hang" until the mutex is released. Fixing > this would need some APR work, new interfaces, blah
This is not a problem. On graceful-stop or reload the processes will get the lock one by one and die (or hang somewhere else). I have never seen a left over process hanging in this function. > 2) prefork's apr_pollset_poll() loop-on-EINTR loop was not checking > die_now; the child holding the mutex will not die immediately if poll > fails with EINTR, and will hence appear to "hang" until a new connection > is recevied. Fixed by http://svn.apache.org/viewvc?rev=613260&view=rev IMHO this is the same as 3), as apr_pollset_poll() will be called again but with all fds already closed. > I can also reproduce a third case, but I'm not sure about the cause: > > 3) apr_pollset_poll() is blocking despite the fact that the listening > fds are supposedly already closed before entering the syscall. This is the main problem in my experience. > I vaguely recall some issue with epoll being mentioned before in the > context of graceful stop, but I can't find a reference. Colm? > > A very tempting explanation for (3) would be the fact that prefork only > polls for POLLIN events, not POLLHUP or POLLERR, or indeed that it does > not check that the returned event really is a POLLIN event; POSIX says > on poll: > > " ... poll() shall set the POLLHUP, POLLERR, and POLLNVAL flag in > revents if the condition is true, even if the application did not set > the corresponding bit in events." > I also had problems under solaris 9 where processes blocked in lr->accept_func() if the fd had been closed in the meantime. Unfortunately, I cannot reproduce it now even with an unpatched 2.2.6 and I don't remember which configuration I used. But this could be related to the returned event not being POLLIN. > and there's even a comment in the prefork poll code to the effect that > maybe checking the returned event type would be a good idea. But from a > brief play around here, fixing the poll code to DTRT doesn't help. I > think more investigation is needed to understand exactly what is going > on here. > > (Also, just to note; I can reproduce (3) even with my patch to dup2 > against the listener fds.) On Linux with epoll, the hanging processes just blocks in apr_pollset_poll(), so checking the return value won't do any good. Maybe the problem is that (AIUI) poll() returns POLLNVAL if a fd is not open, while epoll() does not have something similar. In epoll.c, a comment says "APR_POLLNVAL is not handled by epoll". Or should epoll return EPOLLHUP in this case? Stefan