Re: Re: Graceful restart not so graceful?

2009-01-12 Thread trawick

On Jan 11, 2009 11:34am, Stefan Fritsch s...@sfritsch.de wrote:

Hi,



thanks for following up on this and sorry for the late response.



On Wednesday 07 January 2009, Jeff Trawick wrote:

 Initial testing of your idea for a timeout was promising.



I couldn't reproduce any hangs under linux with the patch you commited

to trunk.


Thanks for all of your work on this issue!


In my patch I tried to avoid that an idle apache wakes periodically

for no good reasons. But if you don't think that is a problem, please

backport your patch to 2.2.x.


I'd also prefer that prefork doesn't wake up unless it has work, but I  
think the simpler, localized change which was committed will be less  
trouble in the long run.


wrt backporting:
I think that modern Linux and modern Solaris, perhaps the only platforms  
tested so far, both work for the same reason under the covers. Has anyone  
tested with regular poll()? If not, I can try to test another platform or  
at least dumb down APR to bypass epoll/event queue/etc.


Re: Re: Re: Graceful restart not so graceful?

2009-01-12 Thread trawick

On Jan 12, 2009 8:49am, traw...@gmail.com wrote:

wrt backporting:
I think that modern Linux and modern Solaris, perhaps the only platforms  
tested so far, both work for the same reason under the covers. Has anyone  
tested with regular poll()? If not, I can try to test another platform or  
at least dumb down APR to bypass epoll/event queue/etc.


I tested prefork on OpenSolaris with ac_cv_func_port_create=no (ie, use  
poll()) and it worked with/without the patch.
I tested on Leopard (kevent) and it appeared to work with/without the  
patch, but I'm not very confident that I was able to reproduce the right  
timing window.


I've proposed for backport to 2.2.x.


Re: Graceful restart not so graceful?

2009-01-11 Thread Stefan Fritsch
Hi,

thanks for following up on this and sorry for the late response.

On Wednesday 07 January 2009, Jeff Trawick wrote:
 Initial testing of your idea for a timeout was promising.

I couldn't reproduce any hangs under linux with the patch you commited 
to trunk.

In my patch I tried to avoid that an idle apache wakes periodically 
for no good reasons. But if you don't think that is a problem, please 
backport your patch to 2.2.x.

Cheers,
Stefan


Re: Graceful restart not so graceful?

2009-01-07 Thread Joe Orton
On Tue, Jan 06, 2009 at 12:10:25PM -0600, William Rowe wrote:
 Would folks comment on Nathan's, Joe's and Stefan's work on
 
 https://issues.apache.org/bugzilla/show_bug.cgi?id=42829
 
 and offer any comments on why this patch;
 
 https://issues.apache.org/bugzilla/attachment.cgi?id=22822
 
 should not be applied to trunk/ and branches/2.2.x/ in time for
 the next releases?

I never found the time to commit this for a bunch of reasons:

1) I never convinced myself that adding a bunch of complexity to 
prefork, per Stefan's patch, was the right way to address this

2) I never worked out what impact the lack of check on the poll 
descriptor event type (POLLERR etc) had on this, but that needs 
resolving too

3) I hadn't checked whether worker et al had similar issues

4) I think that a simpler solution to this problem would be to add a 
timeout (e.g. 30s) on the child pollset_poll() call so they are 
guaranteed to pop at some point, but I haven't had time to try this or 
work out whether that will suck in some other way.

(Also, my stuff to dup2() against the listener was demonstrably useless, 
per the thread Jeff referenced, so that should definitely not be 
committed, regardless.)

Regards, Joe


Re: Graceful restart not so graceful?

2009-01-07 Thread Jeff Trawick
On Wed, Jan 7, 2009 at 6:03 AM, Joe Orton jor...@redhat.com wrote:

 On Tue, Jan 06, 2009 at 12:10:25PM -0600, William Rowe wrote:
  Would folks comment on Nathan's, Joe's and Stefan's work on
 
  https://issues.apache.org/bugzilla/show_bug.cgi?id=42829
 
  and offer any comments on why this patch;
 
  https://issues.apache.org/bugzilla/attachment.cgi?id=22822
 
  should not be applied to trunk/ and branches/2.2.x/ in time for
  the next releases?

 I never found the time to commit this for a bunch of reasons:

 1) I never convinced myself that adding a bunch of complexity to
 prefork, per Stefan's patch, was the right way to address this

 2) I never worked out what impact the lack of check on the poll
 descriptor event type (POLLERR etc) had on this, but that needs
 resolving too

 3) I hadn't checked whether worker et al had similar issues

 4) I think that a simpler solution to this problem would be to add a
 timeout (e.g. 30s) on the child pollset_poll() call so they are
 guaranteed to pop at some point, but I haven't had time to try this or
 work out whether that will suck in some other way.


Initial testing of your idea for a timeout was promising.  I'll try to test
it with graceful stop as well.

Index: server/mpm/prefork/prefork.c
===
--- server/mpm/prefork/prefork.c(revision 731724)
+++ server/mpm/prefork/prefork.c(working copy)
@@ -540,10 +540,12 @@
 apr_int32_t numdesc;
 const apr_pollfd_t *pdesc;

-/* timeout == -1 == wait forever */
-status = apr_pollset_poll(pollset, -1, numdesc, pdesc);
+/* timeout == 10 seconds to avoid a hang at graceful
restart/stop
+ * caused by the closing of sockets by the signal handler
+ */
+status = apr_pollset_poll(pollset, apr_time_from_sec(10),
numdesc, pdesc);
 if (status != APR_SUCCESS) {
-if (APR_STATUS_IS_EINTR(status)) {
+if (APR_STATUS_IS_TIMEUP(status) ||
APR_STATUS_IS_EINTR(status)) {
 if (one_process  shutdown_pending) {
 return;
 }


Re: Graceful restart not so graceful?

2009-01-07 Thread Plüm, Rüdiger, VF-Group
 

 -Ursprüngliche Nachricht-
 Von: Jeff Trawick [mailto:traw...@gmail.com] 
 Gesendet: Mittwoch, 7. Januar 2009 13:03
 An: dev@httpd.apache.org
 Betreff: Re: Graceful restart not so graceful?
 
 On Wed, Jan 7, 2009 at 6:03 AM, Joe Orton jor...@redhat.com wrote:
 
 
   On Tue, Jan 06, 2009 at 12:10:25PM -0600, William Rowe wrote:
Would folks comment on Nathan's, Joe's and Stefan's work on
   
https://issues.apache.org/bugzilla/show_bug.cgi?id=42829
   
and offer any comments on why this patch;
   
https://issues.apache.org/bugzilla/attachment.cgi?id=22822
   
should not be applied to trunk/ and branches/2.2.x/ 
 in time for
the next releases?
   
   
   I never found the time to commit this for a bunch of reasons:
   
   1) I never convinced myself that adding a bunch of complexity to
   prefork, per Stefan's patch, was the right way to address this
   
   2) I never worked out what impact the lack of check on the poll
   descriptor event type (POLLERR etc) had on this, but that needs
   resolving too
   
   3) I hadn't checked whether worker et al had similar issues
   
   4) I think that a simpler solution to this problem 
 would be to add a
   timeout (e.g. 30s) on the child pollset_poll() call so they are
   guaranteed to pop at some point, but I haven't had time 
 to try this or
   work out whether that will suck in some other way.
 
 
 Initial testing of your idea for a timeout was promising.  
 I'll try to test it with graceful stop as well.
 
 Index: server/mpm/prefork/prefork.c
 == 
 =
 --- server/mpm/prefork/prefork.c(revision 731724)
 +++ server/mpm/prefork/prefork.c(working copy)
 @@ -540,10 +540,12 @@
  apr_int32_t numdesc;
  const apr_pollfd_t *pdesc;
  
 -/* timeout == -1 == wait forever */
 -status = apr_pollset_poll(pollset, -1, 
 numdesc, pdesc);
 +/* timeout == 10 seconds to avoid a hang at 
 graceful restart/stop
 + * caused by the closing of sockets by the 
 signal handler
 + */
 +status = apr_pollset_poll(pollset, 
 apr_time_from_sec(10), numdesc, pdesc);
  if (status != APR_SUCCESS) {
 -if (APR_STATUS_IS_EINTR(status)) {
 +if (APR_STATUS_IS_TIMEUP(status) || 
 APR_STATUS_IS_EINTR(status)) {
  if (one_process  shutdown_pending) {
  return;
  }

Don't we need to add

SAFE_ACCEPT(accept_mutex_off());

before the continue later on in the 

if (APR_STATUS_IS_TIMEUP(status) || APR_STATUS_IS_EINTR(status)) {

block or is it guaranteed that if we allready have the lock
that another SAFE_ACCEPT(accept_mutex_on()); doesn't do any harm?

Regards

Rüdiger



Graceful restart not so graceful?

2009-01-06 Thread William A. Rowe, Jr.
Would folks comment on Nathan's, Joe's and Stefan's work on

https://issues.apache.org/bugzilla/show_bug.cgi?id=42829

and offer any comments on why this patch;

https://issues.apache.org/bugzilla/attachment.cgi?id=22822

should not be applied to trunk/ and branches/2.2.x/ in time for
the next releases?

If I hear no objections by noon Jan 8 I'll apply.  According to
Stefan this current patch is improving the situation.

Bill



Re: Graceful restart not so graceful?

2009-01-06 Thread Jeff Trawick
On Tue, Jan 6, 2009 at 1:10 PM, William A. Rowe, Jr. wr...@rowe-clan.netwrote:

 Would folks comment on Nathan's, Joe's and Stefan's work on

 https://issues.apache.org/bugzilla/show_bug.cgi?id=42829

 and offer any comments on why this patch;

 https://issues.apache.org/bugzilla/attachment.cgi?id=22822

 should not be applied to trunk/ and branches/2.2.x/ in time for
 the next releases?

 If I hear no objections by noon Jan 8 I'll apply.  According to
 Stefan this current patch is improving the situation.

 Bill

 See also http://marc.info/?l=apache-httpd-devm=119945416529706w=2


Re: Graceful restart not so graceful?

2009-01-06 Thread Ruediger Pluem


On 01/06/2009 07:10 PM, William A. Rowe, Jr. wrote:
 Would folks comment on Nathan's, Joe's and Stefan's work on
 
 https://issues.apache.org/bugzilla/show_bug.cgi?id=42829
 
 and offer any comments on why this patch;
 
 https://issues.apache.org/bugzilla/attachment.cgi?id=22822
 
 should not be applied to trunk/ and branches/2.2.x/ in time for
 the next releases?

@@ -573,6 +588,11 @@ static void child_main(int child_num_arg
 apr_int32_t numdesc;
 const apr_pollfd_t *pdesc;

+if (die_now) {
+status = !APR_SUCCESS;
+goto unlock;
+}
+

Hm, what happens if we get the signal exactly here?
Then stop_listening only set die_now to 1. So we reenter the poll and stay in
until we get a new request to OUR process on one of the listeners.
Is this the desired behaviour?

 /* timeout == -1 == wait forever */
 status = apr_pollset_poll(pollset, -1, numdesc, pdesc);
 if (status != APR_SUCCESS) {

How about this one instead:

Index: server/mpm/prefork/prefork.c
===
--- server/mpm/prefork/prefork.c(Revision 731601)
+++ server/mpm/prefork/prefork.c(Arbeitskopie)
@@ -99,6 +99,8 @@
 static int mpm_state = AP_MPMQ_STARTING;
 static ap_pod_t *pod;

+static int dummy_listener;
+
 /*
  * The max child slot ever assigned, preserved across restarts.  Necessary
  * to deal with MaxClients changes across AP_SIG_GRACEFUL restarts.  We
@@ -136,6 +138,8 @@
 #endif /* TPF */

 static volatile int die_now = 0;
+static volatile int listeners_closed = 0;
+static int close_connection = 1;

 #ifdef GPROF
 /*
@@ -304,8 +308,19 @@

 static void stop_listening(int sig)
 {
-ap_close_listeners();
+if (close_connection  !listeners_closed) {
+ap_listen_rec *lr;

+for (lr = ap_listeners; lr; lr = lr-next) {
+apr_os_sock_t fd;
+
+apr_os_sock_get(fd, lr-sd);
+
+dup2(dummy_listener, fd);
+}
+listeners_closed = 1;
+}
+
 /* For a graceful stop, we want the child to exit when done */
 die_now = 1;
 }
@@ -533,6 +548,7 @@
 if (num_listensocks == 1) {
 /* There is only one listener record, so refer to that one. */
 lr = ap_listeners;
+close_connection = 0;
 }
 else {
 /* multiple listening sockets - need to poll */
@@ -540,8 +556,14 @@
 apr_int32_t numdesc;
 const apr_pollfd_t *pdesc;

+if (die_now) {
+status = !APR_SUCCESS;
+goto unlock;
+}
+
 /* timeout == -1 == wait forever */
 status = apr_pollset_poll(pollset, -1, numdesc, pdesc);
+close_connection = 0;
 if (status != APR_SUCCESS) {
 if (APR_STATUS_IS_EINTR(status)) {
 if (one_process  shutdown_pending) {
@@ -595,8 +617,15 @@
 /* if we accept() something we don't want to die, so we have to
  * defer the exit
  */
-status = lr-accept_func(csd, lr, ptrans);
+if (!die_now) {
+status = lr-accept_func(csd, lr, ptrans);
+}
+else {
+status = !APR_SUCCESS;
+}

+unlock:
+close_connection = 1;
 SAFE_ACCEPT(accept_mutex_off());  /* unlock after accept */

 if (status == APR_EGENERAL) {
@@ -612,6 +641,11 @@
  * socket options, file descriptors, and read/write buffers.
  */

+if (die_now  !listeners_closed) {
+ap_close_listeners();
+listeners_closed = 1;
+}
+
 current_conn = ap_run_create_connection(ptrans, ap_server_conf, csd, 
my_child_num, sbh, bucket_alloc);
 if (current_conn) {
 ap_process_connection(current_conn, csd);
@@ -890,6 +924,17 @@
 mpm_state = AP_MPMQ_STOPPING;
 return 1;
 }
+
+if (dummy_listener == 0) {
+dummy_listener = socket(AF_INET, SOCK_STREAM, 0);
+if (dummy_listener  0) {
+rv = errno;
+ap_log_error(APLOG_MARK, APLOG_EMERG, rv, s,
+ Couldn't create dummy listener socket);
+mpm_state = AP_MPMQ_STOPPING;
+return 1;
+}
+}

 #if APR_USE_SYSVSEM_SERIALIZE
 if (ap_accept_lock_mech == APR_LOCK_DEFAULT ||



Regards

Rüdiger

Index: server/mpm/prefork/prefork.c
===
--- server/mpm/prefork/prefork.c	(Revision 731601)
+++ server/mpm/prefork/prefork.c	(Arbeitskopie)
@@ -99,6 +99,8 @@
 static int mpm_state = AP_MPMQ_STARTING;
 static ap_pod_t *pod;
 
+static int dummy_listener;
+
 /*
  * The max child slot ever assigned, preserved across restarts.  Necessary
  * to deal with MaxClients changes across AP_SIG_GRACEFUL