[ 
https://issues.apache.org/jira/browse/FLINK-24035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17407129#comment-17407129
 ] 

Piotr Nowojski commented on FLINK-24035:
----------------------------------------

Solution here is to always request at least a single buffer.

This is not an issue on the output side, because there are no 
{{BufferListeners}} in that case. Also if buffer is requested on the output 
side, it will be used and eventually flushed.

On the input on the other hand, with exclusive buffers > 0, we are already 
requesting exclusive buffers in a blocking way with a timeout (FLINK-12852), so 
we know that task will be able to make progress regardless if we notify about 
more buffers or not. With exclusive buffers = 0, this solution requests a 
single floating buffer, so we will also be able to make a progress. Once data 
starts flowing/this single buffer will be recycled, listeners would be notified 
and more buffers would be requested.

> Fix the deadlock issue caused by buffer listeners may not be notified
> ---------------------------------------------------------------------
>
>                 Key: FLINK-24035
>                 URL: https://issues.apache.org/jira/browse/FLINK-24035
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Network
>    Affects Versions: 1.14.0
>            Reporter: Yingjie Cao
>            Assignee: Yingjie Cao
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.14.0
>
>
> The buffer listeners are not notified when the the local buffer pool receives 
> available notification from the global pool. This may cause potential 
> deadlock issue:
>  # A LocalBufferPool is created, but there is no available buffers in the 
> global NetworkBufferPool.
>  # The LocalBufferPool registers an available buffer listener to the global 
> NetworkBufferPool.
>  # The BufferManager requests buffers from the LocalBufferPool but no buffer 
> is available. As a result, it registers an available buffer listener to the 
> LocalBufferPool.
>  # A buffer is recycled to the global pool and the local buffer pool is 
> notified about the available buffer.
>  # The local buffer pool requests the available buffer from the global pool 
> but the registered available buffer listener of BufferManager is not notified 
> and it can never get a chance to be notified so deadlock occurs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to