Situation: worker or event MPM. Process shutdown due to:

- MaxRequestsPerChild
- MaxSpareThreads
- Graceful stop or graceful restart

When an httpd child process shuts down due to the above conditions, it doesn't respect existing Keep-Alive connections. When the previous response signalled keep-alive to the client and no next request was received, the connection will be terminated. This is especially noticeable for event, though I can reproduce for worker as well.

Based on a patch by Jeff

http://mail-archives.apache.org/mod_mbox/httpd-dev/200711.mbox/%3ccc67648e0711130530h45c2a28ctcd743b2160e22...@mail.gmail.com%3e

who implemented KeepAliveWhileExiting

I worked on a patch for trunk (and another one for 2.2.x) which seems to solve the problem in the above cases, maybe also for prefork, but I didn't investigate, whether prefork even has the problem, and if so, whether the core part of Jeff's patch already solves that.

Patch: http://people.apache.org/~rjung/patches/keep-alive-while-exiting-trunk-20100601a.patch

Before I proceed working on it, I would like some feedback, if this makes sense. It is a change in one of the more delicate parts and I think the patch is good enough now, to already review it.

Some notes:

1) Implementation

The worker threads keep running when the shutdown is graceful. The listener marks the process as quiescing, removes its sockets from the pollset and shuts them down. Then it proceeds looping until GracefulShutdownTimeout seconds are over. The timeout is necessary, because the pollset doesn't have an API to check, whether it is empty. So we don't know, whether there are still keep-alive connections.
2) Test

You can set MaxRequestsPerChild to test this, or also test against the MaxSpareThreads by choosing a small difference between minSpare and MaxSpare.

When using ab, a server-aborted connection will result in the following failure type:

Failed requests:        N
   (Connect: 0, Receive: 0, Length: N, Exceptions: 0)

where N is the number of occurances.

Although MaxrequestsPerChild is a convenient test case, keep in mind that for the worker and event MPMs we count keep-alive connections not requests. So you have to choose an appropriately low MaxRequestsPerChild in order to trigger it, especially when testing with "ab -k" which give you 100 requests per connection (in the default httpd configuration).

3) Configuration

The patch uses the same KeepAliveWhileExiting witch, that is already part of Jeff's patch. The switch is a per VHost setting. For most of the above purposes it is used in a global context, because the MPM code often has no vhost in its hands (mostly no connection). To test, you have to set "KeepAliveWhileExiting On" globally and in the VHosts. For a final patch we would need to decide, whether KeepAliveWhileExiting becomes global only, or the global part of the patch gets another directive, or the feature is unconditionally switched on.

The other relevant configuration setting is GracefulShutdownTimeout, until which time the graceful process shutdown is forced into an ungraceful one.

4) Related problems

The following BZs seem to be related, but I didn't yet check the influence of the patch on them:

https://issues.apache.org/bugzilla/show_bug.cgi?id=43081
https://issues.apache.org/bugzilla/show_bug.cgi?id=43359

5) Open questions

- Should the behaviour be default? If not, should there be one global KeepAliveWhileExiting?

- Any way of getting the size of a pollset? Or is using GracefulShutdownTimeout as a workaround OK?

- there's possibly stuff which can be removed from the new remove_listeners_from_pollset()

- Should there be a more fine-grained MPM state than AP_MPMQ_STOPPING, like AP_MPMQ_GRACEFUL and AP_MPMQ_STOPPING? This would make the patch a little nicer, especially the parts outside the MPM.

- Anything to add for prefork?

- Other MPMs? I didn't yet investigate the Windows MPM, but since there is only one child, I assume this kind of gracefulness doesn't make much sense?

Any comments welcome.

Regards,

Rainer

Reply via email to