OH my.  How hard would it be to fix this?
If you get a patch, could you tell me?

Thanks,

Peter Arseneau.

-----Original Message-----
From: Tianwei [mailto:tianwei.sh...@gmail.com] 
Sent: February 4, 2010 2:21 AM
To: Kostya Serebryany
Cc: dev@httpd.apache.org; Neil Vachharajani
Subject: Re: Is that a race condition bug?

besides, the following warning indicated that developer already have
been aware of the problem and tolerate it:
1. comments in worker.c:
  a:
 896         requests_this_child--; /* FIXME: should be synchronized - aaron */
  b:
 632         /* TODO: requests_this_child should be synchronized - aaron */
 633         if (requests_this_child <= 0) {
 634             check_infinite_requests();
 635         }

2. reported by tsan:
==5876== WARNING: Possible data race during read of size 4 at 0x6FB768: {{{
==5876==    T22 (locks held: {}):
==5876==     #0  listener_thread
/home/tianwei/apache/httpd-2.2.14/server/mpm/worker/worker.c:633
==5876==     #1  dummy_worker
/home/tianwei/apache/httpd-2.2.14/srclib/apr/threadproc/unix/thread.c:142
==5876==     #2  ThreadSanitizerStartThread
/home/tianwei/valgrind/drt/tsan/ts_valgrind_intercepts.c:525
==5876==   Concurrent write(s) happened at (OR AFTER) these points:
==5876==    T15 (locks held: {}):
==5876==     #0  worker_thread
/home/tianwei/apache/httpd-2.2.14/server/mpm/worker/worker.c:895
==5876==     #1  dummy_worker
/home/tianwei/apache/httpd-2.2.14/srclib/apr/threadproc/unix/thread.c:142
==5876==     #2  ThreadSanitizerStartThread
/home/tianwei/valgrind/drt/tsan/ts_valgrind_intercepts.c:525
==5876==   Address 0x6FB768 is 0 bytes inside data symbol "requests_this_child"
==5876== }}}

Tianwei
On Thu, Feb 4, 2010 at 3:05 PM, Kostya Serebryany <k...@google.com> wrote:
> Looking at the races reported by ThreadSanitizer I can see comments in the
> httpd code which look like this:
>         /* There is an intentional race condition in this design:
>          * in a multithreaded app, one thread might be reading
>          * from this cache_element to resolve a timestamp from
>          * TIME_CACHE_SIZE seconds ago at the same time that
>          * another thread is copying the exploded form of the
>          * current time into the same cache_element. ...
> So, I afraid there are quite a few intentional races in this code.
> It'll be hard to find harmful races among those, unless someone takes time
> to annotate the benign races.
> --kcc
>
> On Thu, Feb 4, 2010 at 9:42 AM, Tianwei <tianwei.sh...@gmail.com> wrote:
>>
>> Hi, httpd developers:
>>    I am using a
>> tool(http://code.google.com/p/data-race-test/wiki/ThreadSanitizer) to
>> study the potential race condition bugs in httpd.
>> I use httpd-2.2.14, and configure with --mpm=worker. For my first
>> round of testing, it actually reports lots of warning.  I am checking
>> one by one manually.  I have analyzed one of the bugs carefully, and
>> do not know if we should classify it as a benign race or not.
>> waring description:
>> 1. I enable the USE_ATOMICS_GENERIC where use the hash lock to protect
>> the apr_atomic_casptr
>> 2. The bug report is:
>> ==5876== WARNING: Possible data race during write of size 8 at 0x429C578:
>> {{{
>> ==5876==    T2 (locks held: {L1}):
>> ==5876==     #0  apr_atomic_casptr
>> /home/tianwei/apache/httpd-2.2.14/srclib/apr/atomic/unix/mutex.c:184
>> ==5876==     #1  ap_queue_info_set_idle
>> /home/tianwei/apache/httpd-2.2.14/server/mpm/worker/fdqueue.c:104
>> ==5876==     #2  worker_thread
>> /home/tianwei/apache/httpd-2.2.14/server/mpm/worker/worker.c:844
>> ==5876==     #3  dummy_worker
>> /home/tianwei/apache/httpd-2.2.14/srclib/apr/threadproc/unix/thread.c:142
>> ==5876==     #4  ThreadSanitizerStartThread
>> /home/tianwei/valgrind/drt/tsan/ts_valgrind_intercepts.c:525
>> ==5876==   Concurrent read(s) happened at (OR AFTER) these points:
>> ==5876==    T22 (locks held: {}):
>> ==5876==     #0  ap_queue_info_wait_for_idler
>> /home/tianwei/apache/httpd-2.2.14/server/mpm/worker/fdqueue.c:209
>> ==5876==     #1  listener_thread
>> /home/tianwei/apache/httpd-2.2.14/server/mpm/worker/worker.c:642
>> ==5876==     #2  dummy_worker
>> /home/tianwei/apache/httpd-2.2.14/srclib/apr/threadproc/unix/thread.c:142
>> ==5876==     #3  ThreadSanitizerStartThread
>> /home/tianwei/valgrind/drt/tsan/ts_valgrind_intercepts.c:525
>>
>> 3. warning analysis.
>>    listener thread:
>>           apr_status_t ap_queue_info_wait_for_idler(fd_queue_info_t
>> *queue_info,
>>                                          apr_pool_t **recycled_pool)
>> {
>>
>>   /* Block if the count of idle workers is zero */
>>    if (queue_info->idlers == 0) {
>>        rv = apr_thread_mutex_lock(queue_info->idlers_mutex);
>>       while (queue_info->idlers == 0) {
>>            rv = apr_thread_cond_wait(queue_info->wait_for_idler,
>>                                  queue_info->idlers_mutex);
>>       }
>>     rv = apr_thread_mutex_unlock(queue_info->idlers_mutex);
>>   }
>>
>>   apr_atomic_dec32(&(queue_info->idlers));
>>
>>   for (;;) {
>>      struct recycled_pool *first_pool = queue_info->recycled_pools;
>>   ....
>>   }
>> }
>>
>>   worker thread:
>>      apr_status_t ap_queue_info_set_idle(fd_queue_info_t *queue_info,
>>                                    apr_pool_t *pool_to_recycle)
>> {
>>    for (;;) {
>>            //new_recycle->next = queue_info->recycled_pools;
>>            struct recycled_pool *next = queue_info->recycled_pools;
>>            new_recycle->next = next;
>>            if (apr_atomic_casptr((volatile
>> void**)&(queue_info->recycled_pools),
>>                                  new_recycle, next) ==
>>                next) {
>>                break;
>>            }
>>      }
>>   /* If this thread just made the idle worker count nonzero,
>>     * wake up the listener. */
>>    if (prev_idlers == 0) {
>>        rv = apr_thread_mutex_lock(queue_info->idlers_mutex);
>>        if (rv != APR_SUCCESS) {
>>            return rv;
>>        }
>>        rv = apr_thread_cond_signal(queue_info->wait_for_idler);
>>        if (rv != APR_SUCCESS) {
>>            apr_thread_mutex_unlock(queue_info->idlers_mutex);
>>            return rv;
>>        }
>>        rv = apr_thread_mutex_unlock(queue_info->idlers_mutex);
>>        if (rv != APR_SUCCESS) {
>>            return rv;
>>        }
>>    }
>>
>>
>> Problem:
>>   if in listener thread, the queue_info->idlers  is "4", where it
>> will skip the "queue_info->idlers == 0" condition in
>> ap_queue_info_wait_for_idler, and decrease the idlers atomically, then
>> it will read "queue_info->recycled_pools". Then in worker thread, it
>> will write the "queue_info->recycled_pools" in apr_atomic_casptr with
>> a lock.
>> We can see that since the listener thread no longer execute the "lock,
>> cond_wait, unlock", even the worker thread will have a pthread_signal,
>> there is
>> no happen-before relation between the read of
>> "queue_info->recycled_pools" and the write of
>> "queue_info->recycled_pools". According to the definition, there is a
>> race condition here.
>>
>> Question:
>>   According to analysis above, it seems that there is a race,
>> however, I am guessing the developer do not treat it as a problem, or
>> there is other unknown mechanism for me to protect this code? Can you
>> give me some suggestions?
>>
>> Thanks.
>>
>> Tianwei
>> --
>> Sheng, Tianwei
>> Inst. of High Performance Computing
>> Dept. of Computer Sci. & Tech.
>> Tsinghua Univ.
>
>



-- 
Sheng, Tianwei
Inst. of High Performance Computing
Dept. of Computer Sci. & Tech.
Tsinghua Univ.

Reply via email to