[ 
https://issues.apache.org/jira/browse/FLINK-14872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yingjie Cao updated FLINK-14872:
--------------------------------
    Description: 
Currently, the buffer pool size of InputGate reading from blocking 
ResultPartition is unbounded which have a potential of using too many buffers 
and may lead to ResultPartition of the same task can not acquire enough core 
buffers and finally lead to deadlock.

Considers the following case:

Core buffers are reserved for InputGate and ResultPartition -> InputGate 
consumes lots of Buffer (not including the buffer reserved for ResultPartition) 
-> Other tasks acquire exclusive buffer for InputGate and trigger redistribute 
of Buffers (Buffers taken by previous InputGate can not be released) -> The 
first task of which InputGate uses lots of buffers begin to emit records but 
can not acquire enough core Buffers (Some operators may not emit records out 
immediately or there is just nothing to emit) -> Deadlock.

 

I think we can fix this problem by limit the number of Buffers can be allocated 
by a InputGate which reads from blocking ResultPartition.

  was:
Currently, the buffer pool size of InputGate reading from blocking 
ResultPartition is unbounded which have a potential of using too many buffers 
and may lead to ResultPartition of the same task can not acquire enough core 
buffers and finally lead to deadlock.

Considered the following case:

Core buffers are reserved for InputGate and ResultPartition -> InputGate 
consumes lots of Buffer (not including the buffer reserved for ResultPartition) 
-> Other tasks acquire exclusive buffer for InputGate and trigger redistribute 
of Buffers (Buffers taken by previous InputGate can not be released) -> The 
first task of which InputGate uses lots of buffers begin to emit records but 
can not acquire enough core Buffers (Some operators may not emit records out 
immediately or there is just nothing to emit) -> Deadlock.

 

I think we can fix this problem by limit the number of Buffers can be allocated 
by a InputGate which reads from blocking ResultPartition.


> Potential deadlock for task reading from blocking ResultPartition.
> ------------------------------------------------------------------
>
>                 Key: FLINK-14872
>                 URL: https://issues.apache.org/jira/browse/FLINK-14872
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Yingjie Cao
>            Priority: Major
>
> Currently, the buffer pool size of InputGate reading from blocking 
> ResultPartition is unbounded which have a potential of using too many buffers 
> and may lead to ResultPartition of the same task can not acquire enough core 
> buffers and finally lead to deadlock.
> Considers the following case:
> Core buffers are reserved for InputGate and ResultPartition -> InputGate 
> consumes lots of Buffer (not including the buffer reserved for 
> ResultPartition) -> Other tasks acquire exclusive buffer for InputGate and 
> trigger redistribute of Buffers (Buffers taken by previous InputGate can not 
> be released) -> The first task of which InputGate uses lots of buffers begin 
> to emit records but can not acquire enough core Buffers (Some operators may 
> not emit records out immediately or there is just nothing to emit) -> 
> Deadlock.
>  
> I think we can fix this problem by limit the number of Buffers can be 
> allocated by a InputGate which reads from blocking ResultPartition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to