Hi,

the issue was discovered as part of tracing down a deadlock condition in
an SVN test [1].

As far as I understand it, the problem lies in the C-standard not
explicitly defining when a function registered with atexit() is called
in the context of thread termination [2].

The case is reproducible with the following code inside a DLL and
executing that from a calling process (a complete repro project is
available on my issue tracker [1]):

    atexit(apr_terminate);
    apr_initialize();

    apr_pool_t *pool;
    apr_pool_create_ex(&pool, NULL, NULL, NULL);
    apr_thread_pool_t *thread_pool;
    apr_thread_pool_create(&thread_pool, 0, 16, pool);
    apr_thread_pool_idle_wait_set(thread_pool, 1000 * 1000);

    for (int i = 0; i < 3; ++i) {
      apr_thread_pool_push(thread_pool, foo, nullptr, 0, NULL);
    }

In this particular case we create a couple of threads and before the
threads terminate the calling process terminates. At this point
apr_terminate() is called which got registered via the atexit()-call (as
suggested in the apr_terminate-documentation). When that call frees up
the threadpool, it gets stuck in thread_pool_cleanup() waiting for the
thd_cnt to become 0 (which it never will).

I believe the attached patch (done against the apr-util 1.5 branch)
should solve the underlying problem. It determines the number of failed
thread joins (i.e. those which were either detached or did not exit
correctly) and reduces the thd_cnt variable by that amount.

The test case I used on Windows is fixed with that patch and I'll run
the full apr-test suite later today (if I don't get back to this mail it
means the test suite passed for me with that patch).

I also reviewed the implementation of apr_thread_join() on other
platforms and could not spot any immediate problems the patch would
cause on these platforms. However, since I'm quite new to the APR
development and also am not have only marginal knowledge with platforms
other than Windows, I'd appreciate if someone knowing the details about
APR's thread design and who is more familiar with other platforms could
review the patch/approach before I commit the patch.


Regards,
Stefan

[1] http://www.luke1410.de:8090/browse/MAXSVN-94
[2]
https://stackoverflow.com/questions/39655868/what-does-the-posix-standard-say-about-thread-stacks-in-atexit-handlers-what
Index: misc/apr_thread_pool.c
===================================================================
--- misc/apr_thread_pool.c      (revision 1749154)
+++ misc/apr_thread_pool.c      (working copy)
@@ -838,10 +838,13 @@
     apr_thread_mutex_unlock(me->lock);
 
     n_dbg = 0;
+    apr_size_t numincompletejoins = 0;
     if (NULL != (head = elt)) {
         while (elt) {
             tail = elt;
-            apr_thread_join(&rv, elt->thd);
+            if (APR_SUCCESS != apr_thread_join(&rv, elt->thd)) {
+              ++numincompletejoins;
+            }
             elt = APR_RING_NEXT(elt, link);
             ++n_dbg;
         }
@@ -848,6 +851,8 @@
         apr_thread_mutex_lock(me->lock);
         APR_RING_SPLICE_TAIL(me->recycled_thds, head, tail,
                              apr_thread_list_elt, link);
+        assert(me->thd_cnt >= numincompletejoins);
+        me->thd_cnt -= numincompletejoins;
         apr_thread_mutex_unlock(me->lock);
     }
     assert(cnt == n_dbg);

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to