[ https://issues.apache.org/jira/browse/THRIFT-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Buğra Gedik updated THRIFT-3932: -------------------------------- Description: {{ThreadManger::join}} calls {{stopImpl(true)}}, which in term calls {{removeWorker(workerCount_);}}. The latter waits until {{while (workerCount_ != workerMaxCount_)}}. Within the {{run}} method of the workers, the last thread that detects {{workerCount_ == workerMaxCount_}} notifies {{removeWorker}}. The {{run}} method has the following additional code that is executed at the very end: {code} { Synchronized s(manager_->workerMonitor_); manager_->deadWorkers_.insert(this->thread()); if (notifyManager) { manager_->workerMonitor_.notify(); } } {code} This is an independent synchronized block. Now assume 2 threads. One of them has {{notifyManager=true}} as it detected the {{workerCount_ == workerMaxCount_}} condition earlier. It is possible that this thread gets to execute the above code first, and the ThreadManager's {{removeWorker}} method unblocks and eventually the ThreadManager's {{join}} returns and the object is destructed. When the other thread reaches the synchronized block above, it will crash, as the manager is not around anymore. Besides, the ThreadManager never joins its threads. Attached is a small fix that alleviates these problems. was: {{ThreadManger::join}} calls {{stopImpl(true)}}, which in term calls {{removeWorker(workerCount_);}}. The latter waits until {{while (workerCount_ != workerMaxCount_)}}. In the run method, the last thread that detects {{workerCount_ == workerMaxCount_}} notifies the {{removeWorker}} method. However, the run method has the following additional code that is executed at the very end: {code} { Synchronized s(manager_->workerMonitor_); manager_->deadWorkers_.insert(this->thread()); if (notifyManager) { manager_->workerMonitor_.notify(); } } {code} This is an independent synchronized block. Now assume 2 threads. One of them has {{notifyManager=true}} as it detected the {{workerCount_ == workerMaxCount_}} condition earlier. It is possible that this thread gets to execute the above code first, and the ThreadManager's {{removeWorker}} method unblocks and eventually the ThreadManager's {{join}} returns and the object destructed. When the other thread reaches the synchronized block above, it will crash, as the manager is not around anymore. Besides, the ThreadManager never joins its threads. Attached is a small fix that alleviates the problem. > C++ ThreadManager has a rare termination race > --------------------------------------------- > > Key: THRIFT-3932 > URL: https://issues.apache.org/jira/browse/THRIFT-3932 > Project: Thrift > Issue Type: Bug > Reporter: Buğra Gedik > Attachments: thrift-patch > > > {{ThreadManger::join}} calls {{stopImpl(true)}}, which in term calls > {{removeWorker(workerCount_);}}. The latter waits until {{while (workerCount_ > != workerMaxCount_)}}. Within the {{run}} method of the workers, the last > thread that detects {{workerCount_ == workerMaxCount_}} notifies > {{removeWorker}}. The {{run}} method has the following additional code that > is executed at the very end: > {code} > { > Synchronized s(manager_->workerMonitor_); > manager_->deadWorkers_.insert(this->thread()); > if (notifyManager) { > manager_->workerMonitor_.notify(); > } > } > {code} > This is an independent synchronized block. Now assume 2 threads. One of them > has {{notifyManager=true}} as it detected the {{workerCount_ == > workerMaxCount_}} condition earlier. It is possible that this thread gets to > execute the above code first, and the ThreadManager's {{removeWorker}} > method unblocks and eventually the ThreadManager's {{join}} returns and the > object is destructed. When the other thread reaches the synchronized block > above, it will crash, as the manager is not around anymore. > Besides, the ThreadManager never joins its threads. > Attached is a small fix that alleviates these problems. -- This message was sent by Atlassian JIRA (v6.3.4#6332)