Adam Jakubek created THRIFT-5127:
------------------------------------

             Summary: Race condition in TNonblockingServer
                 Key: THRIFT-5127
                 URL: https://issues.apache.org/jira/browse/THRIFT-5127
             Project: Thrift
          Issue Type: Bug
          Components: C++ - Library
    Affects Versions: 0.13.0
            Reporter: Adam Jakubek
         Attachments: thrift_deadlock.cpp

When {{TNonblockingServer::stop}} method is called on a different thread 
shortly after {{TNonblockingServer::serve}}, the server occassionally fails to 
terminate.

The following sequence of events has been observed with Thrift 0.13:
# {{TNonblockingServer::serve}} starts spawning listener threads.
# Another thread calls {{TNonblockingServer::stop}} before all listeners are 
created.
A shutdown request is sent to those IO threads which have been already 
initialized (but not all).
# {{TNonblockingServer::serve}} completes spawning the remaining listener 
threads (including the primary IO thread with index 0).
# {{TNonblockingServer::serve}} continues to run despite the stop request, 
since the main thread and some of the listener threads are still active.

The issue seems to be caused by late initialization of 
{{TNonblockingIOThread}}'s state.
Server's listener threads are spawned in the {{TNonblockingServer::serve}} 
method (in a nested call to {{registerEvents}}. They finish initialization for 
some of their state in the {{TNonblockingIOThread::run}} method (part of the 
{{Runnable}} interface).
One of the fields which is initialized at that stage is the 
{{notificationPipeFDs_}} array, which as far as I can tell is used to pass 
messages between threads.

It seems that the thread which invokes {{TNonblockingServer::stop}} might 
attempt to use the notification pipe to request shutdown while the 
{{notificationPipeFDs_}} descriptor array is still uninitialized.
In that case, the message is lost (the {{TNonblockingIOThread::notify}} call 
will return immediately) and the target thread never exits.

Btw. the {{threadId_}} field of {{TNonblockingIOThread}} is also accessed 
concurrently by multiple threads without synchronization:
- the field is written in {{TNonblockingIOThread::registerEvents}} after 
creation of the listener thread,
- there is a read in {{TNonblockingIOThread::breakLoop}} when server is being 
stopped.

I'm attaching sample code which can reproduce the issue (although not 
deterministically).
Some tweaking of the {{STOPPING_THREAD_DELAY}} constant might be necessary to 
observe the deadlock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to