OH my. How hard would it be to fix this? If you get a patch, could you tell me?
Thanks, Peter Arseneau. -----Original Message----- From: Tianwei [mailto:tianwei.sh...@gmail.com] Sent: February 4, 2010 2:21 AM To: Kostya Serebryany Cc: dev@httpd.apache.org; Neil Vachharajani Subject: Re: Is that a race condition bug? besides, the following warning indicated that developer already have been aware of the problem and tolerate it: 1. comments in worker.c: a: 896 requests_this_child--; /* FIXME: should be synchronized - aaron */ b: 632 /* TODO: requests_this_child should be synchronized - aaron */ 633 if (requests_this_child <= 0) { 634 check_infinite_requests(); 635 } 2. reported by tsan: ==5876== WARNING: Possible data race during read of size 4 at 0x6FB768: {{{ ==5876== T22 (locks held: {}): ==5876== #0 listener_thread /home/tianwei/apache/httpd-2.2.14/server/mpm/worker/worker.c:633 ==5876== #1 dummy_worker /home/tianwei/apache/httpd-2.2.14/srclib/apr/threadproc/unix/thread.c:142 ==5876== #2 ThreadSanitizerStartThread /home/tianwei/valgrind/drt/tsan/ts_valgrind_intercepts.c:525 ==5876== Concurrent write(s) happened at (OR AFTER) these points: ==5876== T15 (locks held: {}): ==5876== #0 worker_thread /home/tianwei/apache/httpd-2.2.14/server/mpm/worker/worker.c:895 ==5876== #1 dummy_worker /home/tianwei/apache/httpd-2.2.14/srclib/apr/threadproc/unix/thread.c:142 ==5876== #2 ThreadSanitizerStartThread /home/tianwei/valgrind/drt/tsan/ts_valgrind_intercepts.c:525 ==5876== Address 0x6FB768 is 0 bytes inside data symbol "requests_this_child" ==5876== }}} Tianwei On Thu, Feb 4, 2010 at 3:05 PM, Kostya Serebryany <k...@google.com> wrote: > Looking at the races reported by ThreadSanitizer I can see comments in the > httpd code which look like this: > /* There is an intentional race condition in this design: > * in a multithreaded app, one thread might be reading > * from this cache_element to resolve a timestamp from > * TIME_CACHE_SIZE seconds ago at the same time that > * another thread is copying the exploded form of the > * current time into the same cache_element. ... > So, I afraid there are quite a few intentional races in this code. > It'll be hard to find harmful races among those, unless someone takes time > to annotate the benign races. > --kcc > > On Thu, Feb 4, 2010 at 9:42 AM, Tianwei <tianwei.sh...@gmail.com> wrote: >> >> Hi, httpd developers: >> I am using a >> tool(http://code.google.com/p/data-race-test/wiki/ThreadSanitizer) to >> study the potential race condition bugs in httpd. >> I use httpd-2.2.14, and configure with --mpm=worker. For my first >> round of testing, it actually reports lots of warning. I am checking >> one by one manually. I have analyzed one of the bugs carefully, and >> do not know if we should classify it as a benign race or not. >> waring description: >> 1. I enable the USE_ATOMICS_GENERIC where use the hash lock to protect >> the apr_atomic_casptr >> 2. The bug report is: >> ==5876== WARNING: Possible data race during write of size 8 at 0x429C578: >> {{{ >> ==5876== T2 (locks held: {L1}): >> ==5876== #0 apr_atomic_casptr >> /home/tianwei/apache/httpd-2.2.14/srclib/apr/atomic/unix/mutex.c:184 >> ==5876== #1 ap_queue_info_set_idle >> /home/tianwei/apache/httpd-2.2.14/server/mpm/worker/fdqueue.c:104 >> ==5876== #2 worker_thread >> /home/tianwei/apache/httpd-2.2.14/server/mpm/worker/worker.c:844 >> ==5876== #3 dummy_worker >> /home/tianwei/apache/httpd-2.2.14/srclib/apr/threadproc/unix/thread.c:142 >> ==5876== #4 ThreadSanitizerStartThread >> /home/tianwei/valgrind/drt/tsan/ts_valgrind_intercepts.c:525 >> ==5876== Concurrent read(s) happened at (OR AFTER) these points: >> ==5876== T22 (locks held: {}): >> ==5876== #0 ap_queue_info_wait_for_idler >> /home/tianwei/apache/httpd-2.2.14/server/mpm/worker/fdqueue.c:209 >> ==5876== #1 listener_thread >> /home/tianwei/apache/httpd-2.2.14/server/mpm/worker/worker.c:642 >> ==5876== #2 dummy_worker >> /home/tianwei/apache/httpd-2.2.14/srclib/apr/threadproc/unix/thread.c:142 >> ==5876== #3 ThreadSanitizerStartThread >> /home/tianwei/valgrind/drt/tsan/ts_valgrind_intercepts.c:525 >> >> 3. warning analysis. >> listener thread: >> apr_status_t ap_queue_info_wait_for_idler(fd_queue_info_t >> *queue_info, >> apr_pool_t **recycled_pool) >> { >> >> /* Block if the count of idle workers is zero */ >> if (queue_info->idlers == 0) { >> rv = apr_thread_mutex_lock(queue_info->idlers_mutex); >> while (queue_info->idlers == 0) { >> rv = apr_thread_cond_wait(queue_info->wait_for_idler, >> queue_info->idlers_mutex); >> } >> rv = apr_thread_mutex_unlock(queue_info->idlers_mutex); >> } >> >> apr_atomic_dec32(&(queue_info->idlers)); >> >> for (;;) { >> struct recycled_pool *first_pool = queue_info->recycled_pools; >> .... >> } >> } >> >> worker thread: >> apr_status_t ap_queue_info_set_idle(fd_queue_info_t *queue_info, >> apr_pool_t *pool_to_recycle) >> { >> for (;;) { >> //new_recycle->next = queue_info->recycled_pools; >> struct recycled_pool *next = queue_info->recycled_pools; >> new_recycle->next = next; >> if (apr_atomic_casptr((volatile >> void**)&(queue_info->recycled_pools), >> new_recycle, next) == >> next) { >> break; >> } >> } >> /* If this thread just made the idle worker count nonzero, >> * wake up the listener. */ >> if (prev_idlers == 0) { >> rv = apr_thread_mutex_lock(queue_info->idlers_mutex); >> if (rv != APR_SUCCESS) { >> return rv; >> } >> rv = apr_thread_cond_signal(queue_info->wait_for_idler); >> if (rv != APR_SUCCESS) { >> apr_thread_mutex_unlock(queue_info->idlers_mutex); >> return rv; >> } >> rv = apr_thread_mutex_unlock(queue_info->idlers_mutex); >> if (rv != APR_SUCCESS) { >> return rv; >> } >> } >> >> >> Problem: >> if in listener thread, the queue_info->idlers is "4", where it >> will skip the "queue_info->idlers == 0" condition in >> ap_queue_info_wait_for_idler, and decrease the idlers atomically, then >> it will read "queue_info->recycled_pools". Then in worker thread, it >> will write the "queue_info->recycled_pools" in apr_atomic_casptr with >> a lock. >> We can see that since the listener thread no longer execute the "lock, >> cond_wait, unlock", even the worker thread will have a pthread_signal, >> there is >> no happen-before relation between the read of >> "queue_info->recycled_pools" and the write of >> "queue_info->recycled_pools". According to the definition, there is a >> race condition here. >> >> Question: >> According to analysis above, it seems that there is a race, >> however, I am guessing the developer do not treat it as a problem, or >> there is other unknown mechanism for me to protect this code? Can you >> give me some suggestions? >> >> Thanks. >> >> Tianwei >> -- >> Sheng, Tianwei >> Inst. of High Performance Computing >> Dept. of Computer Sci. & Tech. >> Tsinghua Univ. > > -- Sheng, Tianwei Inst. of High Performance Computing Dept. of Computer Sci. & Tech. Tsinghua Univ.