Adam Jakubek created THRIFT-5127: ------------------------------------ Summary: Race condition in TNonblockingServer Key: THRIFT-5127 URL: https://issues.apache.org/jira/browse/THRIFT-5127 Project: Thrift Issue Type: Bug Components: C++ - Library Affects Versions: 0.13.0 Reporter: Adam Jakubek Attachments: thrift_deadlock.cpp
When {{TNonblockingServer::stop}} method is called on a different thread shortly after {{TNonblockingServer::serve}}, the server occassionally fails to terminate. The following sequence of events has been observed with Thrift 0.13: # {{TNonblockingServer::serve}} starts spawning listener threads. # Another thread calls {{TNonblockingServer::stop}} before all listeners are created. A shutdown request is sent to those IO threads which have been already initialized (but not all). # {{TNonblockingServer::serve}} completes spawning the remaining listener threads (including the primary IO thread with index 0). # {{TNonblockingServer::serve}} continues to run despite the stop request, since the main thread and some of the listener threads are still active. The issue seems to be caused by late initialization of {{TNonblockingIOThread}}'s state. Server's listener threads are spawned in the {{TNonblockingServer::serve}} method (in a nested call to {{registerEvents}}. They finish initialization for some of their state in the {{TNonblockingIOThread::run}} method (part of the {{Runnable}} interface). One of the fields which is initialized at that stage is the {{notificationPipeFDs_}} array, which as far as I can tell is used to pass messages between threads. It seems that the thread which invokes {{TNonblockingServer::stop}} might attempt to use the notification pipe to request shutdown while the {{notificationPipeFDs_}} descriptor array is still uninitialized. In that case, the message is lost (the {{TNonblockingIOThread::notify}} call will return immediately) and the target thread never exits. Btw. the {{threadId_}} field of {{TNonblockingIOThread}} is also accessed concurrently by multiple threads without synchronization: - the field is written in {{TNonblockingIOThread::registerEvents}} after creation of the listener thread, - there is a read in {{TNonblockingIOThread::breakLoop}} when server is being stopped. I'm attaching sample code which can reproduce the issue (although not deterministically). Some tweaking of the {{STOPPING_THREAD_DELAY}} constant might be necessary to observe the deadlock. -- This message was sent by Atlassian Jira (v8.3.4#803005)