[jira] [Commented] (THRIFT-3932) C++ ThreadManager has a rare termination race

ASF GitHub Bot (JIRA) Sat, 01 Oct 2016 16:00:53 -0700

    [ 
https://issues.apache.org/jira/browse/THRIFT-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15539293#comment-15539293
 ]


ASF GitHub Bot commented on THRIFT-3932:
----------------------------------------

Github user jeking3 commented on the issue:

    https://github.com/apache/thrift/pull/1103
  
    By removing the THRIFT_SLEEP_USEC(1) calls in the test it gets much faster, 
which is great, however as part of this effort I have discovered our boost 
implementation is broken in some way.  I haven't gotten to the root cause yet, 
but I can easily get concurrency_test to hang on Linux with boost 1.54 and 
Windows with boost 1.58 after fixing the logic in ThreadManagerTests.h / 
blockTest to be correct.  The symptom is that notifyAll() isn't waking up all 
the threads it is supposed to.  It seems to wake up about 64 threads before it 
gives up.  If you run blockTest by itself (comment out the other thread-manager 
tests in Tests.cpp) it will wait 1 out of 2 times, and if you look at things in 
the debugger you will see a bunch of threads blocking on blockMonitor_ waiting 
for the notification from the test thread.  They are properly synchronized but 
not all of them wake up all the time.


> C++ ThreadManager has a rare termination race
> ---------------------------------------------
>
>                 Key: THRIFT-3932
>                 URL: https://issues.apache.org/jira/browse/THRIFT-3932
>             Project: Thrift
>          Issue Type: Bug
>          Components: C++ - Library
>            Reporter: Buğra Gedik
>            Assignee: James E. King, III
>         Attachments: thrift-patch
>
>          Time Spent: 8h
>  Remaining Estimate: 0h
>
> {{ThreadManger::join}} calls {{stopImpl(true)}}, which in turn calls 
> {{removeWorker(workerCount_);}}. The latter waits until {{while (workerCount_ 
> != workerMaxCount_)}}. Within the {{run}} method of the workers, the last 
> thread that detects {{workerCount_ == workerMaxCount_}} notifies 
> {{removeWorker}}. The {{run}} method has the following additional code that 
> is executed at the very end:
> {code}
>     {
>       Synchronized s(manager_->workerMonitor_);
>       manager_->deadWorkers_.insert(this->thread());
>       if (notifyManager) {
>         manager_->workerMonitor_.notify();
>       }
>     }
> {code}
> This is an independent synchronized block. Now assume 2 threads. One of them 
> has {{notifyManager=true}} as it detected the {{workerCount_ == 
> workerMaxCount_}} condition earlier. It is possible that this thread gets to 
> execute  the above code block first, {{ThreadManager}}'s {{removeWorker}} 
> method unblocks, and eventually {{ThreadManager}}'s {{join}} returns and the 
> object is destructed. When the other thread reaches the synchronized block 
> above, it will crash, as the manager is not around anymore.
> Besides, {{ThreadManager}} never joins its threads.
> Attached is a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (THRIFT-3932) C++ ThreadManager has a rare termination race

Reply via email to