Re: Re: Graceful restart not so graceful?
On Jan 11, 2009 11:34am, Stefan Fritsch s...@sfritsch.de wrote: Hi, thanks for following up on this and sorry for the late response. On Wednesday 07 January 2009, Jeff Trawick wrote: Initial testing of your idea for a timeout was promising. I couldn't reproduce any hangs under linux with the patch you commited to trunk. Thanks for all of your work on this issue! In my patch I tried to avoid that an idle apache wakes periodically for no good reasons. But if you don't think that is a problem, please backport your patch to 2.2.x. I'd also prefer that prefork doesn't wake up unless it has work, but I think the simpler, localized change which was committed will be less trouble in the long run. wrt backporting: I think that modern Linux and modern Solaris, perhaps the only platforms tested so far, both work for the same reason under the covers. Has anyone tested with regular poll()? If not, I can try to test another platform or at least dumb down APR to bypass epoll/event queue/etc.
Re: Re: Re: Graceful restart not so graceful?
On Jan 12, 2009 8:49am, traw...@gmail.com wrote: wrt backporting: I think that modern Linux and modern Solaris, perhaps the only platforms tested so far, both work for the same reason under the covers. Has anyone tested with regular poll()? If not, I can try to test another platform or at least dumb down APR to bypass epoll/event queue/etc. I tested prefork on OpenSolaris with ac_cv_func_port_create=no (ie, use poll()) and it worked with/without the patch. I tested on Leopard (kevent) and it appeared to work with/without the patch, but I'm not very confident that I was able to reproduce the right timing window. I've proposed for backport to 2.2.x.
Re: Graceful restart not so graceful?
Hi, thanks for following up on this and sorry for the late response. On Wednesday 07 January 2009, Jeff Trawick wrote: Initial testing of your idea for a timeout was promising. I couldn't reproduce any hangs under linux with the patch you commited to trunk. In my patch I tried to avoid that an idle apache wakes periodically for no good reasons. But if you don't think that is a problem, please backport your patch to 2.2.x. Cheers, Stefan
Re: Graceful restart not so graceful?
On Tue, Jan 06, 2009 at 12:10:25PM -0600, William Rowe wrote: Would folks comment on Nathan's, Joe's and Stefan's work on https://issues.apache.org/bugzilla/show_bug.cgi?id=42829 and offer any comments on why this patch; https://issues.apache.org/bugzilla/attachment.cgi?id=22822 should not be applied to trunk/ and branches/2.2.x/ in time for the next releases? I never found the time to commit this for a bunch of reasons: 1) I never convinced myself that adding a bunch of complexity to prefork, per Stefan's patch, was the right way to address this 2) I never worked out what impact the lack of check on the poll descriptor event type (POLLERR etc) had on this, but that needs resolving too 3) I hadn't checked whether worker et al had similar issues 4) I think that a simpler solution to this problem would be to add a timeout (e.g. 30s) on the child pollset_poll() call so they are guaranteed to pop at some point, but I haven't had time to try this or work out whether that will suck in some other way. (Also, my stuff to dup2() against the listener was demonstrably useless, per the thread Jeff referenced, so that should definitely not be committed, regardless.) Regards, Joe
Re: Graceful restart not so graceful?
On Wed, Jan 7, 2009 at 6:03 AM, Joe Orton jor...@redhat.com wrote: On Tue, Jan 06, 2009 at 12:10:25PM -0600, William Rowe wrote: Would folks comment on Nathan's, Joe's and Stefan's work on https://issues.apache.org/bugzilla/show_bug.cgi?id=42829 and offer any comments on why this patch; https://issues.apache.org/bugzilla/attachment.cgi?id=22822 should not be applied to trunk/ and branches/2.2.x/ in time for the next releases? I never found the time to commit this for a bunch of reasons: 1) I never convinced myself that adding a bunch of complexity to prefork, per Stefan's patch, was the right way to address this 2) I never worked out what impact the lack of check on the poll descriptor event type (POLLERR etc) had on this, but that needs resolving too 3) I hadn't checked whether worker et al had similar issues 4) I think that a simpler solution to this problem would be to add a timeout (e.g. 30s) on the child pollset_poll() call so they are guaranteed to pop at some point, but I haven't had time to try this or work out whether that will suck in some other way. Initial testing of your idea for a timeout was promising. I'll try to test it with graceful stop as well. Index: server/mpm/prefork/prefork.c === --- server/mpm/prefork/prefork.c(revision 731724) +++ server/mpm/prefork/prefork.c(working copy) @@ -540,10 +540,12 @@ apr_int32_t numdesc; const apr_pollfd_t *pdesc; -/* timeout == -1 == wait forever */ -status = apr_pollset_poll(pollset, -1, numdesc, pdesc); +/* timeout == 10 seconds to avoid a hang at graceful restart/stop + * caused by the closing of sockets by the signal handler + */ +status = apr_pollset_poll(pollset, apr_time_from_sec(10), numdesc, pdesc); if (status != APR_SUCCESS) { -if (APR_STATUS_IS_EINTR(status)) { +if (APR_STATUS_IS_TIMEUP(status) || APR_STATUS_IS_EINTR(status)) { if (one_process shutdown_pending) { return; }
Re: Graceful restart not so graceful?
-Ursprüngliche Nachricht- Von: Jeff Trawick [mailto:traw...@gmail.com] Gesendet: Mittwoch, 7. Januar 2009 13:03 An: dev@httpd.apache.org Betreff: Re: Graceful restart not so graceful? On Wed, Jan 7, 2009 at 6:03 AM, Joe Orton jor...@redhat.com wrote: On Tue, Jan 06, 2009 at 12:10:25PM -0600, William Rowe wrote: Would folks comment on Nathan's, Joe's and Stefan's work on https://issues.apache.org/bugzilla/show_bug.cgi?id=42829 and offer any comments on why this patch; https://issues.apache.org/bugzilla/attachment.cgi?id=22822 should not be applied to trunk/ and branches/2.2.x/ in time for the next releases? I never found the time to commit this for a bunch of reasons: 1) I never convinced myself that adding a bunch of complexity to prefork, per Stefan's patch, was the right way to address this 2) I never worked out what impact the lack of check on the poll descriptor event type (POLLERR etc) had on this, but that needs resolving too 3) I hadn't checked whether worker et al had similar issues 4) I think that a simpler solution to this problem would be to add a timeout (e.g. 30s) on the child pollset_poll() call so they are guaranteed to pop at some point, but I haven't had time to try this or work out whether that will suck in some other way. Initial testing of your idea for a timeout was promising. I'll try to test it with graceful stop as well. Index: server/mpm/prefork/prefork.c == = --- server/mpm/prefork/prefork.c(revision 731724) +++ server/mpm/prefork/prefork.c(working copy) @@ -540,10 +540,12 @@ apr_int32_t numdesc; const apr_pollfd_t *pdesc; -/* timeout == -1 == wait forever */ -status = apr_pollset_poll(pollset, -1, numdesc, pdesc); +/* timeout == 10 seconds to avoid a hang at graceful restart/stop + * caused by the closing of sockets by the signal handler + */ +status = apr_pollset_poll(pollset, apr_time_from_sec(10), numdesc, pdesc); if (status != APR_SUCCESS) { -if (APR_STATUS_IS_EINTR(status)) { +if (APR_STATUS_IS_TIMEUP(status) || APR_STATUS_IS_EINTR(status)) { if (one_process shutdown_pending) { return; } Don't we need to add SAFE_ACCEPT(accept_mutex_off()); before the continue later on in the if (APR_STATUS_IS_TIMEUP(status) || APR_STATUS_IS_EINTR(status)) { block or is it guaranteed that if we allready have the lock that another SAFE_ACCEPT(accept_mutex_on()); doesn't do any harm? Regards Rüdiger
Graceful restart not so graceful?
Would folks comment on Nathan's, Joe's and Stefan's work on https://issues.apache.org/bugzilla/show_bug.cgi?id=42829 and offer any comments on why this patch; https://issues.apache.org/bugzilla/attachment.cgi?id=22822 should not be applied to trunk/ and branches/2.2.x/ in time for the next releases? If I hear no objections by noon Jan 8 I'll apply. According to Stefan this current patch is improving the situation. Bill
Re: Graceful restart not so graceful?
On Tue, Jan 6, 2009 at 1:10 PM, William A. Rowe, Jr. wr...@rowe-clan.netwrote: Would folks comment on Nathan's, Joe's and Stefan's work on https://issues.apache.org/bugzilla/show_bug.cgi?id=42829 and offer any comments on why this patch; https://issues.apache.org/bugzilla/attachment.cgi?id=22822 should not be applied to trunk/ and branches/2.2.x/ in time for the next releases? If I hear no objections by noon Jan 8 I'll apply. According to Stefan this current patch is improving the situation. Bill See also http://marc.info/?l=apache-httpd-devm=119945416529706w=2
Re: Graceful restart not so graceful?
On 01/06/2009 07:10 PM, William A. Rowe, Jr. wrote: Would folks comment on Nathan's, Joe's and Stefan's work on https://issues.apache.org/bugzilla/show_bug.cgi?id=42829 and offer any comments on why this patch; https://issues.apache.org/bugzilla/attachment.cgi?id=22822 should not be applied to trunk/ and branches/2.2.x/ in time for the next releases? @@ -573,6 +588,11 @@ static void child_main(int child_num_arg apr_int32_t numdesc; const apr_pollfd_t *pdesc; +if (die_now) { +status = !APR_SUCCESS; +goto unlock; +} + Hm, what happens if we get the signal exactly here? Then stop_listening only set die_now to 1. So we reenter the poll and stay in until we get a new request to OUR process on one of the listeners. Is this the desired behaviour? /* timeout == -1 == wait forever */ status = apr_pollset_poll(pollset, -1, numdesc, pdesc); if (status != APR_SUCCESS) { How about this one instead: Index: server/mpm/prefork/prefork.c === --- server/mpm/prefork/prefork.c(Revision 731601) +++ server/mpm/prefork/prefork.c(Arbeitskopie) @@ -99,6 +99,8 @@ static int mpm_state = AP_MPMQ_STARTING; static ap_pod_t *pod; +static int dummy_listener; + /* * The max child slot ever assigned, preserved across restarts. Necessary * to deal with MaxClients changes across AP_SIG_GRACEFUL restarts. We @@ -136,6 +138,8 @@ #endif /* TPF */ static volatile int die_now = 0; +static volatile int listeners_closed = 0; +static int close_connection = 1; #ifdef GPROF /* @@ -304,8 +308,19 @@ static void stop_listening(int sig) { -ap_close_listeners(); +if (close_connection !listeners_closed) { +ap_listen_rec *lr; +for (lr = ap_listeners; lr; lr = lr-next) { +apr_os_sock_t fd; + +apr_os_sock_get(fd, lr-sd); + +dup2(dummy_listener, fd); +} +listeners_closed = 1; +} + /* For a graceful stop, we want the child to exit when done */ die_now = 1; } @@ -533,6 +548,7 @@ if (num_listensocks == 1) { /* There is only one listener record, so refer to that one. */ lr = ap_listeners; +close_connection = 0; } else { /* multiple listening sockets - need to poll */ @@ -540,8 +556,14 @@ apr_int32_t numdesc; const apr_pollfd_t *pdesc; +if (die_now) { +status = !APR_SUCCESS; +goto unlock; +} + /* timeout == -1 == wait forever */ status = apr_pollset_poll(pollset, -1, numdesc, pdesc); +close_connection = 0; if (status != APR_SUCCESS) { if (APR_STATUS_IS_EINTR(status)) { if (one_process shutdown_pending) { @@ -595,8 +617,15 @@ /* if we accept() something we don't want to die, so we have to * defer the exit */ -status = lr-accept_func(csd, lr, ptrans); +if (!die_now) { +status = lr-accept_func(csd, lr, ptrans); +} +else { +status = !APR_SUCCESS; +} +unlock: +close_connection = 1; SAFE_ACCEPT(accept_mutex_off()); /* unlock after accept */ if (status == APR_EGENERAL) { @@ -612,6 +641,11 @@ * socket options, file descriptors, and read/write buffers. */ +if (die_now !listeners_closed) { +ap_close_listeners(); +listeners_closed = 1; +} + current_conn = ap_run_create_connection(ptrans, ap_server_conf, csd, my_child_num, sbh, bucket_alloc); if (current_conn) { ap_process_connection(current_conn, csd); @@ -890,6 +924,17 @@ mpm_state = AP_MPMQ_STOPPING; return 1; } + +if (dummy_listener == 0) { +dummy_listener = socket(AF_INET, SOCK_STREAM, 0); +if (dummy_listener 0) { +rv = errno; +ap_log_error(APLOG_MARK, APLOG_EMERG, rv, s, + Couldn't create dummy listener socket); +mpm_state = AP_MPMQ_STOPPING; +return 1; +} +} #if APR_USE_SYSVSEM_SERIALIZE if (ap_accept_lock_mech == APR_LOCK_DEFAULT || Regards Rüdiger Index: server/mpm/prefork/prefork.c === --- server/mpm/prefork/prefork.c (Revision 731601) +++ server/mpm/prefork/prefork.c (Arbeitskopie) @@ -99,6 +99,8 @@ static int mpm_state = AP_MPMQ_STARTING; static ap_pod_t *pod; +static int dummy_listener; + /* * The max child slot ever assigned, preserved across restarts. Necessary * to deal with MaxClients changes across AP_SIG_GRACEFUL