Hi --

   I've crafted what seems to me like a reasonably minimal set of
patches to deal with the issue I described in this thread:

http://marc.theaimsgroup.com/?l=apache-httpd-dev&m=113986864730305&w=2

   The crux of the problem is that on Linux, when using httpd with
the worker MPM (and probably the event MPM too), hard restarts and
shutdowns often end up sending SIGKILL to httpd child processes
because those processes are waiting for their worker threads to
finish polling on Keep-Alive connections.

   Apparently, on most OSes, if one thread closes a socket descriptor
then other threads polling on it immediately get a return value.
This certainly seems to be the case on Solaris.  But on Linux,
worker threads polling on their sockets in apr_wait_for_io_or_timeout()
don't get an error return value until the full (usually 15 second)
Keep-Alive timeout period is up.  The main httpd process deems that
too long, and issues SIGKILL to the child processes.

   For me personally, the consequence is that all my nice cleanup
handlers registered against the memory pool that's passed during the
child_init stage never get called.  This is particularly painful
if one is hoping to, for example, cleanly shut down DB connections
that one has opened with mod_dbd/apr_dbd.  In the case of mod_dbd,
it opens its reslist of apr_dbd connections against the pool
passed in the child_init stage, which with the worker MPM is its
pchild pool.  When SIGKILL is applied, the apr_pool_destroy(pchild)
call is often not reached, so DB disconnections don't occur;
even if I'm trying to shut down httpd in a hurry, I don't really
want that to happen if at all possible.

   Without further ado, then, my initial patches.  These are
Unix-only at the moment; I have little experience with other OSes.
If anyone wants to propose something better, and/or suggest
changes, that would be superb.  In the meantime, since these
work for me, I'll start applying them against APR and httpd for
my own use.

   First, the APR patches (against trunk):

===================================
--- include/apr_network_io.h.orig       2006-02-20 16:20:44.841609000
-0500
+++ include/apr_network_io.h    2006-02-20 16:24:19.993533339 -0500

@@ -99,6 +99,7 @@

                                     * until data is available.

                                     * @see apr_socket_accept_filter

                                     */

+#define APR_INTERRUPT_WAIT  65536 /**< Return from IO wait on interrupt
*/

 /** @} */


--- network_io/unix/sockopt.c.orig      2006-02-17 11:24:13.058691778 -0500
+++ network_io/unix/sockopt.c   2006-02-17 11:28:08.910410867 -0500

@@ -318,6 +318,9 @@

         return APR_ENOTIMPL;

 #endif

         break;

+    case APR_INTERRUPT_WAIT:

+        apr_set_option(sock, APR_INTERRUPT_WAIT, on);

+        break;

     default:

         return APR_EINVAL;

     }

--- support/unix/waitio.c.orig  2005-07-09 03:07:17.000000000 -0400
+++ support/unix/waitio.c       2006-02-17 11:23:42.620856949 -0500

@@ -49,7 +49,8 @@


     do {

         rc = poll(&pfd, 1, timeout);

-    } while (rc == -1 && errno == EINTR);

+    } while (rc == -1 && errno == EINTR &&

+             (f || !apr_is_option_set(s, APR_INTERRUPT_WAIT)));

     if (rc == 0) {

         return APR_TIMEUP;

     }

===================================

   Second, the httpd patches (also against trunk):

===================================
--- server/mpm/worker/worker.c.orig     2006-02-20 16:26:55.302701000 -0500
+++ server/mpm/worker/worker.c  2006-02-20 16:46:44.764980568 -0500
@@ -213,6 +213,19 @@
  */
 #define LISTENER_SIGNAL     SIGHUP

+/* The WORKER_SIGNAL signal will be sent from the main thread to the
+ * worker threads after APR_INTERRUPT_WAIT is set true on their sockets.
+ * This ensures that on systems (i.e., Linux) where closing the worker
+ * socket doesn't awake the worker thread when it is polling on the socket
+ * (especially after in apr_wait_for_io_or_timeout() when handling
+ * Keep-Alive connections), close_worker_sockets() and join_workers()
+ * still function in timely manner and allow ungraceful shutdowns to
+ * proceed to completion.  Otherwise join_workers() doesn't return
+ * before the main process decides the child process is non-responsive
+ * and sends a SIGKILL.
+ */
+#define WORKER_SIGNAL       AP_SIG_GRACEFUL
+
 /* An array of socket descriptors in use by each thread used to
  * perform a non-graceful (forced) shutdown of the server. */
 static apr_socket_t **worker_sockets;
@@ -222,6 +235,7 @@
     int i;
     for (i = 0; i < ap_threads_per_child; i++) {
         if (worker_sockets[i]) {
+            apr_socket_opt_set(worker_sockets[i], APR_INTERRUPT_WAIT, 1);
             apr_socket_close(worker_sockets[i]);
             worker_sockets[i] = NULL;
         }
@@ -822,6 +836,11 @@
     ap_scoreboard_image->servers[process_slot][thread_slot].generation
= ap_my_generation;
     ap_update_child_status_from_indexes(process_slot, thread_slot,
SERVER_STARTING, NULL);

+#ifdef HAVE_PTHREAD_KILL
+    unblock_signal(WORKER_SIGNAL);
+    apr_signal(WORKER_SIGNAL, dummy_signal_handler);
+#endif
+
     while (!workers_may_exit) {
         if (!is_idle) {
             rv = ap_queue_info_set_idle(worker_queue_info, last_ptrans);
@@ -1077,6 +1096,13 @@

     for (i = 0; i < ap_threads_per_child; i++) {
         if (threads[i]) { /* if we ever created this thread */
+#ifdef HAVE_PTHREAD_KILL
+            apr_os_thread_t *worker_os_thread;
+
+            apr_os_thread_get(&worker_os_thread, threads[i]);
+            pthread_kill(*worker_os_thread, WORKER_SIGNAL);
+#endif
+
             rv = apr_thread_join(&thread_rv, threads[i]);
             if (rv != APR_SUCCESS) {
                 ap_log_error(APLOG_MARK, APLOG_CRIT, rv, ap_server_conf,
===================================

Chris.

-- 
GPG Key ID: 366A375B
GPG Key Fingerprint: 485E 5041 17E1 E2BB C263  E4DE C8E3 FA36 366A 375B

Reply via email to