[ 
https://issues.apache.org/jira/browse/THRIFT-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16940792#comment-16940792
 ] 

Fei Sun commented on THRIFT-4963:
---------------------------------

The problem is that worker threads use the non-blocking socket returned by 
socketpair to notify the io thread to finish the task(call poll api without 
timeout to check if writable), while io threads need to add task to pending 
task queue and wait forever if the task queue is full. If the socket write 
buffer and task queue are both full, then deadlock.

I modified the cpp thrift tutorial to demo this, see attached [^CppClient.cpp] 
and [^CppServer.cpp]

To reproduce the deadlock:
 # Set tcp write buffer to 2048
 # run server: ./server -io_thread_num 1 -pending_tasks 100 -task_timeout 60000 
-worker_thread_num 10
 # run client: ./client -conn_timeout 60000 -recv_timeout 60000 -send_timeout 
60000 -thread_num 2000
 # Then the server quickly fall into deadlock. After the client exit, run 
another client with thread_num=1, that client will timeout.
 # gdb the server and you will find the call stack like this: 
!deadlock_thread.png!

> TNonblockingServer blocked int addTask(IOThread) and notify(workerThread)
> -------------------------------------------------------------------------
>
>                 Key: THRIFT-4963
>                 URL: https://issues.apache.org/jira/browse/THRIFT-4963
>             Project: Thrift
>          Issue Type: Bug
>          Components: C++ - Library
>    Affects Versions: 0.12.0
>            Reporter: chenguang9239
>            Priority: Major
>         Attachments: CppClient.cpp, CppServer.cpp, deadlock_thread.png
>
>
> hello!
> when using c++ TNonblockingServer(with thread pool),I found it blocked in 
> high QPS status.
> I used pstack to print thread stack and found worker thread and IO thread 
> blocked at:
> The worker thread will call notifyIOThread when it handle expired task, then 
> call TNonblockingIOThread::notify and wait for POLLOUT in poll without 
> timeout. 
> The IO thread will call addTask when it gets requests. And IO threads will 
> lock threadManager->mutex_ in addTask without a timeout
> Is it a bug of thrift 0.12.0?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to