[ https://issues.apache.org/jira/browse/THRIFT-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Buğra Gedik updated THRIFT-3932: -------------------------------- Description: {{ThreadManger::join}} calls {{stopImpl(true)}}, which in turn calls {{removeWorker(workerCount_);}}. The latter waits until {{while (workerCount_ != workerMaxCount_)}}. Within the {{run}} method of the workers, the last thread that detects {{workerCount_ == workerMaxCount_}} notifies {{removeWorker}}. The {{run}} method has the following additional code that is executed at the very end: {code} { Synchronized s(manager_->workerMonitor_); manager_->deadWorkers_.insert(this->thread()); if (notifyManager) { manager_->workerMonitor_.notify(); } } {code} This is an independent synchronized block. Now assume 2 threads. One of them has {{notifyManager=true}} as it detected the {{workerCount_ == workerMaxCount_}} condition earlier. It is possible that this thread gets to execute the above code block first, {{ThreadManager}}'s {{removeWorker}} method unblocks, and eventually {{ThreadManage}}r's {{join}} returns and the object is destructed. When the other thread reaches the synchronized block above, it will crash, as the manager is not around anymore. Besides, {{ThreadManager}} never joins its threads. Attached is a small fix that addresses these problems. was: {{ThreadManger::join}} calls {{stopImpl(true)}}, which in turn calls {{removeWorker(workerCount_);}}. The latter waits until {{while (workerCount_ != workerMaxCount_)}}. Within the {{run}} method of the workers, the last thread that detects {{workerCount_ == workerMaxCount_}} notifies {{removeWorker}}. The {{run}} method has the following additional code that is executed at the very end: {code} { Synchronized s(manager_->workerMonitor_); manager_->deadWorkers_.insert(this->thread()); if (notifyManager) { manager_->workerMonitor_.notify(); } } {code} This is an independent synchronized block. Now assume 2 threads. One of them has {{notifyManager=true}} as it detected the {{workerCount_ == workerMaxCount_}} condition earlier. It is possible that this thread gets to execute the above code first, and the ThreadManager's {{removeWorker}} method unblocks and eventually the ThreadManager's {{join}} returns and the object is destructed. When the other thread reaches the synchronized block above, it will crash, as the manager is not around anymore. Besides, {{ThreadManager}} never joins its threads. Attached is a small fix that addresses these problems. > C++ ThreadManager has a rare termination race > --------------------------------------------- > > Key: THRIFT-3932 > URL: https://issues.apache.org/jira/browse/THRIFT-3932 > Project: Thrift > Issue Type: Bug > Reporter: Buğra Gedik > Attachments: thrift-patch > > > {{ThreadManger::join}} calls {{stopImpl(true)}}, which in turn calls > {{removeWorker(workerCount_);}}. The latter waits until {{while (workerCount_ > != workerMaxCount_)}}. Within the {{run}} method of the workers, the last > thread that detects {{workerCount_ == workerMaxCount_}} notifies > {{removeWorker}}. The {{run}} method has the following additional code that > is executed at the very end: > {code} > { > Synchronized s(manager_->workerMonitor_); > manager_->deadWorkers_.insert(this->thread()); > if (notifyManager) { > manager_->workerMonitor_.notify(); > } > } > {code} > This is an independent synchronized block. Now assume 2 threads. One of them > has {{notifyManager=true}} as it detected the {{workerCount_ == > workerMaxCount_}} condition earlier. It is possible that this thread gets to > execute the above code block first, {{ThreadManager}}'s {{removeWorker}} > method unblocks, and eventually {{ThreadManage}}r's {{join}} returns and the > object is destructed. When the other thread reaches the synchronized block > above, it will crash, as the manager is not around anymore. > Besides, {{ThreadManager}} never joins its threads. > Attached is a small fix that addresses these problems. -- This message was sent by Atlassian JIRA (v6.3.4#6332)