[RFC/PATCH] mpm_winnt: Fix several issues in the child process shutdown logic

Evgeny Kotkov Fri, 07 Jul 2017 03:19:41 -0700

Hi all,

Recently we (Ivan Zhakov and I) found several issues in the code responsible
for shutting down a mpm_winnt's child process.


The attached patch should fix these issues:

 (Please note that the changes are combined into a single patch to make the
  review easier, but I'll commit them independently.)

 (1) As a part of the process of shutting down worker threads, the code
     around child.c:1170 currently posts an amount of I/O completion packets
     equal to the amount of the threads blocked on the I/O completion port.
     Then it sleeps until all these threads "acknowledge" the completion
     packets by decrementing the global amount of blocked threads.

     This can be improved to avoid unnecessary Sleep()'s, and make the
     shutdown faster.  There is no need to block until the threads actually
     receive the completion, as the shutdown process includes a separate step
     that waits until the threads exit.  Instead of synchronizing on the
     amount of the threads blocked on the I/O completion port, we can send
     the number of IOCP_SHUTDOWN completion packets equal to the total
     amount of threads and immediately proceed to the next step.

  (2) If the shutdown routine hits a timeout while waiting for the worker
      threads to exit, it uses TerminateThread() to terminate the remaining
      threads.

      Using TerminateThread() can have dangerous consequences such as
      deadlocks — say, if the the thread is terminated while holding a lock
      or a heap lock in the middle of HeapAlloc(), as these locks would not
      be released.  Or it can corrupt the application state and cause a crash.
      See https://msdn.microsoft.com/en-us/library/windows/desktop/ms686717

      Presumably, a much better alternative here would be to leave the cleanup
      to the operating system by calling TerminateProcess().

  (3) Assuming (2) is in place, the code around child.c:1170 that waits for
      multiple thread handles in batches can be simplified.  With (2), there
      is no difference between ending the wait with one or multiple remaining
      threads.  (Since we terminate the process if at least one thread is
      still active when we hit a timeout.)

      Therefore, instead of making an effort to evenly distribute and batch
      the handles with WaitForMultipleObjects(), we could just start from
      one end, and wait for one thread handle at a time.

Note that apart from this, the code around child.c:1146 that shuts down the
listeners, waits, and then proceeds to shutting down the worker threads is
prone to a subtle race.  Since the wait interval is hardcoded to 1 second,
there could still be an accepted connection after it.  And, as the worker
threads are being shut down, it is feasible that this accepted connection
won't ever find a suitable worker thread.  (Eventually, it is going to be
ungracefully closed).  I think that this can be fixed by properly waiting for
the listener threads to exit, and in the meanwhile, this would avoid having
the Sleep(1000) call altogether.  But this is something that I have now left
for future work.

I would greatly appreciate if someone could review or comment on the proposed
changes.  If anyone has an additional insight on the design or planned, but
unaccomplished changes to the mpm_winnt module, I would appreciate hearing
them as well.

Thanks!


Regards,
Evgeny Kotkov

Index: server/mpm/winnt/child.c
===================================================================
--- server/mpm/winnt/child.c    (revision 1801135)
+++ server/mpm/winnt/child.c    (working copy)
@@ -827,18 +827,6 @@ static DWORD __stdcall worker_main(void *thread_nu
 }
 
 
-static void cleanup_thread(HANDLE *handles, int *thread_cnt,
-                           int thread_to_clean)
-{
-    int i;
-
-    CloseHandle(handles[thread_to_clean]);
-    for (i = thread_to_clean; i < ((*thread_cnt) - 1); i++)
-        handles[i] = handles[i + 1];
-    (*thread_cnt)--;
-}
-
-
 /*
  * child_main()
  * Entry point for the main control thread for the child process.
@@ -890,7 +878,6 @@ void child_main(apr_pool_t *pconf, DWORD parent_pi
     HANDLE *child_handles;
     int listener_started = 0;
     int threads_created = 0;
-    int watch_thread;
     int time_remains;
     int cld;
     DWORD tid;
@@ -1162,16 +1149,16 @@ void child_main(apr_pool_t *pconf, DWORD parent_pi
     /* Shutdown the worker threads
      * Post worker threads blocked on the ThreadDispatch IOCompletion port
      */
-    while (g_blocked_threads > 0) {
+    if (g_blocked_threads > 0) {
         ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS, ap_server_conf, 
APLOGNO(00361)
                      "Child: %d threads blocked on the completion port",
                      g_blocked_threads);
-        for (i=g_blocked_threads; i > 0; i--) {
-            PostQueuedCompletionStatus(ThreadDispatchIOCP, 0,
-                                       IOCP_SHUTDOWN, NULL);
-        }
-        Sleep(1000);
     }
+
+    for (i = 0; i < threads_created; i++) {
+        PostQueuedCompletionStatus(ThreadDispatchIOCP, 0, IOCP_SHUTDOWN, NULL);
+    }
+
     /* Empty the accept queue of completion contexts */
     apr_thread_mutex_lock(qlock);
     while (qhead) {
@@ -1185,82 +1172,62 @@ void child_main(apr_pool_t *pconf, DWORD parent_pi
      * (no more than the global server timeout period which
      * we track in msec remaining).
      */
-    watch_thread = 0;
     time_remains = (int)(ap_server_conf->timeout / APR_TIME_C(1000));
 
     while (threads_created)
     {
-        int nFailsafe = MAXIMUM_WAIT_OBJECTS;
+        HANDLE handle = child_handles[threads_created - 1];
         DWORD dwRet;
 
-        /* Every time we roll over to wait on the first group
-         * of MAXIMUM_WAIT_OBJECTS threads, take a breather,
-         * and infrequently update the error log.
-         */
-        if (watch_thread >= threads_created) {
-            if ((time_remains -= 100) < 0)
-                break;
-
-            /* Every 30 seconds give an update */
-            if ((time_remains % 30000) == 0) {
-                ap_log_error(APLOG_MARK, APLOG_NOTICE, APR_SUCCESS,
-                             ap_server_conf, APLOGNO(00362)
-                             "Child: Waiting %d more seconds "
-                             "for %d worker threads to finish.",
-                             time_remains / 1000, threads_created);
-            }
-            /* We'll poll from the top, 10 times per second */
-            Sleep(100);
-            watch_thread = 0;
+        if (time_remains < 0)
+            break;
+        /* Every 30 seconds give an update */
+        if ((time_remains % 30000) == 0) {
+            ap_log_error(APLOG_MARK, APLOG_NOTICE, APR_SUCCESS,
+                         ap_server_conf, APLOGNO(00362)
+                         "Child: Waiting %d more seconds "
+                         "for %d worker threads to finish.",
+                         time_remains / 1000, threads_created);
         }
 
-        /* Fairness, on each iteration we will pick up with the thread
-         * after the one we just removed, even if it's a single thread.
-         * We don't block here.
-         */
-        dwRet = WaitForMultipleObjects(min(threads_created - watch_thread,
-                                           MAXIMUM_WAIT_OBJECTS),
-                                       child_handles + watch_thread, 0, 0);
-
-        if (dwRet == WAIT_FAILED) {
-            break;
-        }
+        dwRet = WaitForSingleObject(handle, 100);
+        time_remains -= 100;
         if (dwRet == WAIT_TIMEOUT) {
-            /* none ready */
-            watch_thread += MAXIMUM_WAIT_OBJECTS;
-            continue;
+            /* Keep waiting */
         }
-        else if (dwRet >= WAIT_ABANDONED_0) {
-            /* We just got the ownership of the object, which
-             * should happen at most MAXIMUM_WAIT_OBJECTS times.
-             * It does NOT mean that the object is signaled.
-             */
-            if ((nFailsafe--) < 1)
-                break;
+        else if (dwRet == WAIT_OBJECT_0) {
+            CloseHandle(handle);
+            threads_created--;
         }
         else {
-            watch_thread += (dwRet - WAIT_OBJECT_0);
-            if (watch_thread >= threads_created)
-                break;
-            cleanup_thread(child_handles, &threads_created, watch_thread);
+            break;
         }
     }
 
-    /* Kill remaining threads off the hard way */
     if (threads_created) {
         ap_log_error(APLOG_MARK, APLOG_NOTICE, APR_SUCCESS, ap_server_conf, 
APLOGNO(00363)
-                     "Child: Terminating %d threads that failed to exit.",
+                     "Child: Waiting for %d threads timed out, terminating 
process.",
                      threads_created);
-    }
-    for (i = 0; i < threads_created; i++) {
-        int *idx;
-        TerminateThread(child_handles[i], 1);
-        CloseHandle(child_handles[i]);
-        /* Reset the scoreboard entry for the thread we just whacked */
-        idx = apr_hash_get(ht, &child_handles[i], sizeof(HANDLE));
-        if (idx) {
-            ap_update_child_status_from_indexes(0, *idx, SERVER_DEAD, NULL);
+        for (i = 0; i < threads_created; i++) {
+            /* Reset the scoreboard entries for the threads. */
+            int *idx = apr_hash_get(ht, &child_handles[i], sizeof(HANDLE));
+            if (idx) {
+                ap_update_child_status_from_indexes(0, *idx, SERVER_DEAD, 
NULL);
+            }
         }
+        /* We can't wait for any longer, but still have some threads remaining.
+         *
+         * The only thing we can do is terminate the process and let the OS
+         * perform the required cleanup. Terminate with exit code 0, as we do
+         * not want the parent process to restart the child. Note that we use
+         * TerminateProcess() instead of ExitProcess(), as the latter function
+         * causes all DLLs to be unloaded, and it can potentially deadlock if
+         * a DLL unload handler tries to acquire the same lock that is being
+         * held by one of the remaining worker threads.
+         *
+         * See 
https://msdn.microsoft.com/en-us/library/windows/desktop/ms682658
+         */
+        TerminateProcess(GetCurrentProcess(), 0);
     }
     ap_log_error(APLOG_MARK, APLOG_NOTICE, APR_SUCCESS, ap_server_conf, 
APLOGNO(00364)
                  "Child: All worker threads have exited.");

[RFC/PATCH] mpm_winnt: Fix several issues in the child process shutdown logic

Reply via email to