Hi all, Recently we (Ivan Zhakov and I) found several issues in the code responsible for shutting down a mpm_winnt's child process.
The attached patch should fix these issues: (Please note that the changes are combined into a single patch to make the review easier, but I'll commit them independently.) (1) As a part of the process of shutting down worker threads, the code around child.c:1170 currently posts an amount of I/O completion packets equal to the amount of the threads blocked on the I/O completion port. Then it sleeps until all these threads "acknowledge" the completion packets by decrementing the global amount of blocked threads. This can be improved to avoid unnecessary Sleep()'s, and make the shutdown faster. There is no need to block until the threads actually receive the completion, as the shutdown process includes a separate step that waits until the threads exit. Instead of synchronizing on the amount of the threads blocked on the I/O completion port, we can send the number of IOCP_SHUTDOWN completion packets equal to the total amount of threads and immediately proceed to the next step. (2) If the shutdown routine hits a timeout while waiting for the worker threads to exit, it uses TerminateThread() to terminate the remaining threads. Using TerminateThread() can have dangerous consequences such as deadlocks — say, if the the thread is terminated while holding a lock or a heap lock in the middle of HeapAlloc(), as these locks would not be released. Or it can corrupt the application state and cause a crash. See https://msdn.microsoft.com/en-us/library/windows/desktop/ms686717 Presumably, a much better alternative here would be to leave the cleanup to the operating system by calling TerminateProcess(). (3) Assuming (2) is in place, the code around child.c:1170 that waits for multiple thread handles in batches can be simplified. With (2), there is no difference between ending the wait with one or multiple remaining threads. (Since we terminate the process if at least one thread is still active when we hit a timeout.) Therefore, instead of making an effort to evenly distribute and batch the handles with WaitForMultipleObjects(), we could just start from one end, and wait for one thread handle at a time. Note that apart from this, the code around child.c:1146 that shuts down the listeners, waits, and then proceeds to shutting down the worker threads is prone to a subtle race. Since the wait interval is hardcoded to 1 second, there could still be an accepted connection after it. And, as the worker threads are being shut down, it is feasible that this accepted connection won't ever find a suitable worker thread. (Eventually, it is going to be ungracefully closed). I think that this can be fixed by properly waiting for the listener threads to exit, and in the meanwhile, this would avoid having the Sleep(1000) call altogether. But this is something that I have now left for future work. I would greatly appreciate if someone could review or comment on the proposed changes. If anyone has an additional insight on the design or planned, but unaccomplished changes to the mpm_winnt module, I would appreciate hearing them as well. Thanks! Regards, Evgeny Kotkov
Index: server/mpm/winnt/child.c =================================================================== --- server/mpm/winnt/child.c (revision 1801135) +++ server/mpm/winnt/child.c (working copy) @@ -827,18 +827,6 @@ static DWORD __stdcall worker_main(void *thread_nu } -static void cleanup_thread(HANDLE *handles, int *thread_cnt, - int thread_to_clean) -{ - int i; - - CloseHandle(handles[thread_to_clean]); - for (i = thread_to_clean; i < ((*thread_cnt) - 1); i++) - handles[i] = handles[i + 1]; - (*thread_cnt)--; -} - - /* * child_main() * Entry point for the main control thread for the child process. @@ -890,7 +878,6 @@ void child_main(apr_pool_t *pconf, DWORD parent_pi HANDLE *child_handles; int listener_started = 0; int threads_created = 0; - int watch_thread; int time_remains; int cld; DWORD tid; @@ -1162,16 +1149,16 @@ void child_main(apr_pool_t *pconf, DWORD parent_pi /* Shutdown the worker threads * Post worker threads blocked on the ThreadDispatch IOCompletion port */ - while (g_blocked_threads > 0) { + if (g_blocked_threads > 0) { ap_log_error(APLOG_MARK, APLOG_DEBUG, APR_SUCCESS, ap_server_conf, APLOGNO(00361) "Child: %d threads blocked on the completion port", g_blocked_threads); - for (i=g_blocked_threads; i > 0; i--) { - PostQueuedCompletionStatus(ThreadDispatchIOCP, 0, - IOCP_SHUTDOWN, NULL); - } - Sleep(1000); } + + for (i = 0; i < threads_created; i++) { + PostQueuedCompletionStatus(ThreadDispatchIOCP, 0, IOCP_SHUTDOWN, NULL); + } + /* Empty the accept queue of completion contexts */ apr_thread_mutex_lock(qlock); while (qhead) { @@ -1185,82 +1172,62 @@ void child_main(apr_pool_t *pconf, DWORD parent_pi * (no more than the global server timeout period which * we track in msec remaining). */ - watch_thread = 0; time_remains = (int)(ap_server_conf->timeout / APR_TIME_C(1000)); while (threads_created) { - int nFailsafe = MAXIMUM_WAIT_OBJECTS; + HANDLE handle = child_handles[threads_created - 1]; DWORD dwRet; - /* Every time we roll over to wait on the first group - * of MAXIMUM_WAIT_OBJECTS threads, take a breather, - * and infrequently update the error log. - */ - if (watch_thread >= threads_created) { - if ((time_remains -= 100) < 0) - break; - - /* Every 30 seconds give an update */ - if ((time_remains % 30000) == 0) { - ap_log_error(APLOG_MARK, APLOG_NOTICE, APR_SUCCESS, - ap_server_conf, APLOGNO(00362) - "Child: Waiting %d more seconds " - "for %d worker threads to finish.", - time_remains / 1000, threads_created); - } - /* We'll poll from the top, 10 times per second */ - Sleep(100); - watch_thread = 0; + if (time_remains < 0) + break; + /* Every 30 seconds give an update */ + if ((time_remains % 30000) == 0) { + ap_log_error(APLOG_MARK, APLOG_NOTICE, APR_SUCCESS, + ap_server_conf, APLOGNO(00362) + "Child: Waiting %d more seconds " + "for %d worker threads to finish.", + time_remains / 1000, threads_created); } - /* Fairness, on each iteration we will pick up with the thread - * after the one we just removed, even if it's a single thread. - * We don't block here. - */ - dwRet = WaitForMultipleObjects(min(threads_created - watch_thread, - MAXIMUM_WAIT_OBJECTS), - child_handles + watch_thread, 0, 0); - - if (dwRet == WAIT_FAILED) { - break; - } + dwRet = WaitForSingleObject(handle, 100); + time_remains -= 100; if (dwRet == WAIT_TIMEOUT) { - /* none ready */ - watch_thread += MAXIMUM_WAIT_OBJECTS; - continue; + /* Keep waiting */ } - else if (dwRet >= WAIT_ABANDONED_0) { - /* We just got the ownership of the object, which - * should happen at most MAXIMUM_WAIT_OBJECTS times. - * It does NOT mean that the object is signaled. - */ - if ((nFailsafe--) < 1) - break; + else if (dwRet == WAIT_OBJECT_0) { + CloseHandle(handle); + threads_created--; } else { - watch_thread += (dwRet - WAIT_OBJECT_0); - if (watch_thread >= threads_created) - break; - cleanup_thread(child_handles, &threads_created, watch_thread); + break; } } - /* Kill remaining threads off the hard way */ if (threads_created) { ap_log_error(APLOG_MARK, APLOG_NOTICE, APR_SUCCESS, ap_server_conf, APLOGNO(00363) - "Child: Terminating %d threads that failed to exit.", + "Child: Waiting for %d threads timed out, terminating process.", threads_created); - } - for (i = 0; i < threads_created; i++) { - int *idx; - TerminateThread(child_handles[i], 1); - CloseHandle(child_handles[i]); - /* Reset the scoreboard entry for the thread we just whacked */ - idx = apr_hash_get(ht, &child_handles[i], sizeof(HANDLE)); - if (idx) { - ap_update_child_status_from_indexes(0, *idx, SERVER_DEAD, NULL); + for (i = 0; i < threads_created; i++) { + /* Reset the scoreboard entries for the threads. */ + int *idx = apr_hash_get(ht, &child_handles[i], sizeof(HANDLE)); + if (idx) { + ap_update_child_status_from_indexes(0, *idx, SERVER_DEAD, NULL); + } } + /* We can't wait for any longer, but still have some threads remaining. + * + * The only thing we can do is terminate the process and let the OS + * perform the required cleanup. Terminate with exit code 0, as we do + * not want the parent process to restart the child. Note that we use + * TerminateProcess() instead of ExitProcess(), as the latter function + * causes all DLLs to be unloaded, and it can potentially deadlock if + * a DLL unload handler tries to acquire the same lock that is being + * held by one of the remaining worker threads. + * + * See https://msdn.microsoft.com/en-us/library/windows/desktop/ms682658 + */ + TerminateProcess(GetCurrentProcess(), 0); } ap_log_error(APLOG_MARK, APLOG_NOTICE, APR_SUCCESS, ap_server_conf, APLOGNO(00364) "Child: All worker threads have exited.");