Re: Unclean process shutdown in event MPM?

Greg Ames Thu, 29 Apr 2010 09:07:43 -0700

In 2.2, it is expected behavior.  The RFC allows the server to close
keepalive connections when it wants.


The last time I checked, trunk had a related bug:
https://issues.apache.org/bugzilla/show_bug.cgi?id=43359 . Connections
waiting for network writes can also be handled as poll events.  But Event's
process management wasn't updated to take into account that connections
might be blocked on network I/O with no current worker thread.  So those
connections waiting for network writes can also be dropped when the parent
thinks there are too many processes around.

I did a quick scan of the attached patch a while back but didn't commit it
because I thought it should be changed to keep the number of Event - handled
connections (i.e., connections with no worker thread) and what kind of event
they are waiting on in the scoreboard to facilitate a mod_status display
enhancement.  But no Round TUITs for years.  I will look at the patch again
and forget mod_status bells and whistles for now.

On Sun, Apr 25, 2010 at 2:07 PM, Rainer Jung <rainer.j...@kippdata.de>wrote:

> On 23.03.2010 15:30, Jeff Trawick wrote:
>
>> On Tue, Mar 23, 2010 at 10:04 AM, Rainer Jung<rainer.j...@kippdata.de>
>>  wrote:
>>
>>> On 23.03.2010 13:34, Jeff Trawick wrote:
>>>
>>>>
>>>> On Tue, Mar 23, 2010 at 7:19 AM, Rainer Jung<rainer.j...@kippdata.de>
>>>>  wrote:
>>>>
>>>>>
>>>>> I can currently reproduce the following problem with 2.2.15 event MPM
>>>>> under
>>>>> high load:
>>>>>
>>>>> When an httpd child process gets closed due to the max spare threads
>>>>> rule
>>>>> and it holds established client connections for which it has fully
>>>>> received
>>>>> a keep alive request, but not yet send any part of the response, it
>>>>> will
>>>>> simply close that connection.
>>>>>
>>>>> Is that expected behaviour? It doesn't seem reproducible for the worker
>>>>> MPM.
>>>>> The behaviour has been observed using extreme spare rules in order to
>>>>> make
>>>>> processes shut down often, but it still seems not right.
>>>>>
>>>>
>>>> Is this the currently-unhandled situation discussed in this thread?
>>>>
>>>>
>>>>
>>>> http://mail-archives.apache.org/mod_mbox/httpd-dev/200711.mbox/%3ccc67648e0711130530h45c2a28ctcd743b2160e22...@mail.gmail.com%3e
>>>>
>>>> Perhaps Event's special handling for keepalive connections results in
>>>> the window being encountered more often?
>>>>
>>>
>>> I'd say yes. I know from the packet trace, that the previous response on
>>> the
>>> same connection got "Connection: Keep-Alive". But from the time gap of
>>> about
>>> 0.5 seconds between receving the next request and sending the FIN, I
>>> guess,
>>> that the child was not already in the process of shutting down, when the
>>> previous "Connection: Keep-Alive" response was send.
>>>
>>> So for me the question is: if the web server already acknowledged the
>>> next
>>> request (in our case it's a GET request, and a TCP ACK), should it wait
>>> with
>>> shutting down the child until the request has been processed and the
>>> response has been send (and in this case "Connetion: Close" was
>>> included)?
>>>
>>
>> Since the ACK is out of our control, that situation is potentially
>> within the race condition.
>>
>>
>>> For the connections which do not have another request pending, I see no
>>> problem in closing them - although there could be a race condition. When
>>> there's a race (client sends next request while server sends FIN), the
>>> client doesn't expect the server to handle the request (it can always
>>> happen
>>> when a Keep Alive connection times out). In the situation observed it is
>>> annoying, that the server already accepted the next request and
>>> nevertheless
>>> closes the connection without handling the request.
>>>
>>
>> All we can know is whether or not the socket is readable at the point
>> where we want to gracefully exit the process.  In keepalive state we'd
>> wait for {timeout, readability, shutdown-event}, and if readable at
>> wakeup then try to process it unless
>> !c->base_server->keep_alive_while_exiting&&
>> ap_graceful_stop_signalled().
>>
>>  I will do some testing around your patch
>>>
>>> http://people.apache.org/~trawick/keepalive.txt<http://people.apache.org/%7Etrawick/keepalive.txt>
>>>
>>
>> I don't think the patch will cover Event.  It modifies
>> ap_process_http_connection(); ap_process_http_async_connection() is
>> used with Event unless there are "clogging input filters."  I guess
>> the analogous point of processing is inside Event itself.
>>
>> I guess if KeepAliveWhileExiting is enabled (whoops, that's
>> vhost-specific) then Event would have substantially different shutdown
>> logic.
>>
>
> I could now take a second look at it. Directly porting your patch to trunk
> and event is straightforward. There remains a hard problem though: the
> listener thread has a big loop of type
>
>    while (!listener_may_exit) {
>        apr_pollset_poll(...)
>        while (HANDLE_EVENTS) {
>            if (READABLE_SOCKET)
>                ...
>            else if (ACCEPT)
>                ...
>        }
>        HANDLE_KEEPALIVE_TIMEOUTS
>        HANDLE_WRITE_COMPLETION_TIMEOUTS
>    }
>
> Obviously, if we want to respect any previously retunred "Connection:
> Keep-Alive" headers, we can't terminate the loop on listeners_may_exit. As a
> first try, I switched to:
>
>    while (1) {
>        if (listener_may_exit)
>            ap_close_listeners();
>        apr_pollset_poll(...);
>        REMOVE_LISTENERS_FROM_POLLSET
>        while (HANDLE_EVENTS) {
>            if (READABLE_SOCKET)
>                ...
>            else if (ACCEPT)
>                ...
>        }
>        HANDLE_KEEPALIVE_TIMEOUTS
>        HANDLE_WRITE_COMPLETION_TIMEOUTS
>    }
>
> Now the listeners get closed and in combination with your patch the
> connections will not be dropped, but instead will receive a "Connection:
> close" during the next request.
>
> Now the while-loop lacks a correct break criterium. It would need to stop,
> when the pollset is empty (listeners were removed, other connections were
> closed due to end of Keep-Alive or timeout). Unfortunately there is no API
> function for checking whether there are still sockets in the pollset and it
> isn't straightforward how to do that.
>
> Another possibility would be to wait for a maximum of the vhost keepalive
> timeouts. But that seems to be a bit to much.
>
> Any ideas or comments?
>
> Regards,
>
> Rainer
>

Re: Unclean process shutdown in event MPM?

Reply via email to