[
https://issues.apache.org/jira/browse/THRIFT-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15539293#comment-15539293
]
ASF GitHub Bot commented on THRIFT-3932:
----------------------------------------
Github user jeking3 commented on the issue:
https://github.com/apache/thrift/pull/1103
By removing the THRIFT_SLEEP_USEC(1) calls in the test it gets much faster,
which is great, however as part of this effort I have discovered our boost
implementation is broken in some way. I haven't gotten to the root cause yet,
but I can easily get concurrency_test to hang on Linux with boost 1.54 and
Windows with boost 1.58 after fixing the logic in ThreadManagerTests.h /
blockTest to be correct. The symptom is that notifyAll() isn't waking up all
the threads it is supposed to. It seems to wake up about 64 threads before it
gives up. If you run blockTest by itself (comment out the other thread-manager
tests in Tests.cpp) it will wait 1 out of 2 times, and if you look at things in
the debugger you will see a bunch of threads blocking on blockMonitor_ waiting
for the notification from the test thread. They are properly synchronized but
not all of them wake up all the time.
> C++ ThreadManager has a rare termination race
> ---------------------------------------------
>
> Key: THRIFT-3932
> URL: https://issues.apache.org/jira/browse/THRIFT-3932
> Project: Thrift
> Issue Type: Bug
> Components: C++ - Library
> Reporter: Buğra Gedik
> Assignee: James E. King, III
> Attachments: thrift-patch
>
> Time Spent: 8h
> Remaining Estimate: 0h
>
> {{ThreadManger::join}} calls {{stopImpl(true)}}, which in turn calls
> {{removeWorker(workerCount_);}}. The latter waits until {{while (workerCount_
> != workerMaxCount_)}}. Within the {{run}} method of the workers, the last
> thread that detects {{workerCount_ == workerMaxCount_}} notifies
> {{removeWorker}}. The {{run}} method has the following additional code that
> is executed at the very end:
> {code}
> {
> Synchronized s(manager_->workerMonitor_);
> manager_->deadWorkers_.insert(this->thread());
> if (notifyManager) {
> manager_->workerMonitor_.notify();
> }
> }
> {code}
> This is an independent synchronized block. Now assume 2 threads. One of them
> has {{notifyManager=true}} as it detected the {{workerCount_ ==
> workerMaxCount_}} condition earlier. It is possible that this thread gets to
> execute the above code block first, {{ThreadManager}}'s {{removeWorker}}
> method unblocks, and eventually {{ThreadManager}}'s {{join}} returns and the
> object is destructed. When the other thread reaches the synchronized block
> above, it will crash, as the manager is not around anymore.
> Besides, {{ThreadManager}} never joins its threads.
> Attached is a patch.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)