[ 
https://issues.apache.org/jira/browse/THRIFT-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067673#comment-17067673
 ] 

Kanishth Karthik edited comment on THRIFT-4963 at 3/27/20, 5:42 AM:
--------------------------------------------------------------------

I encountered the same issue in thrift 0.10.0 and it still persists in thrift 
0.14.0. I have created a pull request to resolve this. 

The problem is that the IO thread is stuck trying to obtain the lock on the 
first line in add (as shown in the image above) and this lock is held by the 
worker (case of when the task has expired) that is trying to notify the IO 
thread but can't if the buffer is full and will wait endlessly. Both will not 
let go before the other resulting in a deadlock.

This can be resolved if we release the lock before closing the connection and 
notifying just like the case of processing when the task is not expired and 
acquire it back once that is done.

Refer: ThreadManager::Task::run() for the same.


was (Author: kanishthkarthik):
I encountered the same issue in thrift 0.10.0 and it still persists in thrift 
0.14.0. I have created a pull request to resolve this. 

The problem is that the IO thread is stuck trying to obtain the lock on the 
first line in addTask and this lock is held by the worker (case of when the 
task has expired) that is trying to notify the IO thread but can't if the 
buffer is full and will wait endlessly. Both will not let go before the other 
hence deadlock.

This can be resolved if we release the lock while closing the connection and 
notifying just like the case of processing when the task is not expired and 
acquire it back once that is done.

Refer: ThreadManager::Task::run() for the same.

> TNonblockingServer blocked int addTask(IOThread) and notify(workerThread)
> -------------------------------------------------------------------------
>
>                 Key: THRIFT-4963
>                 URL: https://issues.apache.org/jira/browse/THRIFT-4963
>             Project: Thrift
>          Issue Type: Bug
>          Components: C++ - Library
>    Affects Versions: 0.12.0
>            Reporter: chenguang9239
>            Priority: Major
>         Attachments: CppClient.cpp, CppServer.cpp, deadlock_thread.png, 
> wait_timeout.png
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> hello!
> when using c++ TNonblockingServer(with thread pool),I found it blocked in 
> high QPS status.
> I used pstack to print thread stack and found worker thread and IO thread 
> blocked at:
> The worker thread will call notifyIOThread when it handle expired task, then 
> call TNonblockingIOThread::notify and wait for POLLOUT in poll without 
> timeout. 
> The IO thread will call addTask when it gets requests. And IO threads will 
> lock threadManager->mutex_ in addTask without a timeout
> Is it a bug of thrift 0.12.0?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to