Re: general/792: race condition with SIGUSR1 graceful restart

Dean Gaudet Thu, 26 Jun 1997 12:40:10 -0700 (PDT)

The following reply was made to PR general/792; it has been noted by GNATS.

From: Dean Gaudet <[EMAIL PROTECTED]>
To: Nathan Kurz <[EMAIL PROTECTED]>
Subject: Re: general/792: race condition with SIGUSR1 graceful restart
Date: Thu, 26 Jun 1997 12:30:13 -0700 (PDT)

 When it's doing deferred_die it doesn't want to take the signal
 immediately because it's in a position where it will possibly receive a
 request.  USR1 will break it out of accept() though with an EINTR and
 it'll die out.  So everything is fine after it hits accept. 

 Before it hits accept it's in the "die immediately" signal handler.  So
 it's fine from the top of the loop through to the signal() that changes
 the handlers.  In addition it checks the generation through there so it
 catches anything that occured before the "die" signal was put on. 

 So the only place where there's a problem is between the signal(deferred) 
 call and the accept().  But if we do what you suggest then we run into a
 problem *after* the accept call.  We may have accepted a connection, and
 then get hit with a signal before we can disable the longjmp.  That is
 something I deliberately tried to avoid.  We also can't trust the values
 of any of the local variables, so it'd be hard to know if we accepted a
 connection or if we're just supposed to die or what.

 I've been able to run a "while 1; kill -USR1" loop against the server and
 surf without a broken link with the current code.  But before I had the
 deferred stuff in there I did have the slight race condition I talk about
 above -- where an accept may succeed and then get signalled to death.  And
 when that was in there I did get broken links while surfing. 

 On architectures with serialization in that loop the current code lets one
 child live for at most one more request.  On other architectures, where
 everyone gets plopped into accept() and the OS gets to wake 'em up, it's
 possible for some children to be stuck "gracefully exiting" if the OS
 starves them at the accept(). 

 We could protect that with a timer for 1.2... but I'm thinking of
 serializing to solve PR#467 so it may be moot in the future.  Dunno.  Am I
 on crack?

 Your longjmp thing is clever though.  But I can't think of how to test if
 we really got a connection after the loop exits due to deferred_die being
 set.  For example, we could get the signal after accept() returns but
 before csd is set.  race conditions rule. 

 Dean

 On Thu, 26 Jun 1997, Nathan Kurz wrote:

 > 
 > >Number:         792
 > >Category:       general
 > >Synopsis:       race condition with SIGUSR1 graceful restart
 > >Confidential:   no
 > >Severity:       non-critical
 > >Priority:       medium
 > >Responsible:    apache (Apache HTTP Project)
 > >State:          open
 > >Class:          sw-bug
 > >Submitter-Id:   apache
 > >Arrival-Date:   Thu Jun 26 08:50:01 1997
 > >Originator:     [EMAIL PROTECTED]
 > >Organization:
 > apache
 > >Release:        1.2.0
 > >Environment:
 > any
 > >Description:
 > There is a problem with the signal handling of SIGUSR1 in child_main()
 > in http_main.c around line 1775.  If a SIGUSR1 comes too early in the 
 > for loop it will be ignored and the process will wait in accept.  
 > It's none too critical, but could be improved.
 > >How-To-Repeat:
 > This condition can be tested by putting a pause() or sleep in the 
 > for loop just before the accept and then sending a SIGUSR1 to the 
 > process.
 > >Fix:
 > It needs a long jump.  Something like:
 > 
 > if (ap_setjmp(deferred_die_jump_buffer, 1) == 0) {
 >      signal(SIGUSR1, deferred_die_and_jump_handler);
 > }
 > while (! deferred_die) {
 >      clen = sizeof();        
 >      csd = accept(); 
 >      if (csd >=0 || errno != EINTR) break;
 > }
 > signal(SIGUSR1, deferred_die_handler)%3
 > >Audit-Trail:
 > >Unformatted:
 > 
 > 
 >

Re: general/792: race condition with SIGUSR1 graceful restart

Reply via email to