[ https://issues.apache.org/jira/browse/FLINK-14872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16989957#comment-16989957 ]
Yingjie Cao commented on FLINK-14872: ------------------------------------- As discussed, we would implement a temporary fix for the problem for version 1.10 and leave the proper fix to the future version of Flink. I have opened a PR [https://github.com/apache/flink/pull/10472], could you please take a look? [~pnowojski] > Potential deadlock for task reading from blocking ResultPartition. > ------------------------------------------------------------------ > > Key: FLINK-14872 > URL: https://issues.apache.org/jira/browse/FLINK-14872 > Project: Flink > Issue Type: Bug > Components: Runtime / Network > Reporter: Yingjie Cao > Assignee: Yingjie Cao > Priority: Blocker > Labels: pull-request-available > Fix For: 1.10.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Currently, the buffer pool size of InputGate reading from blocking > ResultPartition is unbounded which have a potential of using too many buffers > and may lead to ResultPartition of the same task can not acquire enough core > buffers and finally lead to deadlock. > Considers the following case: > Core buffers are reserved for InputGate and ResultPartition -> InputGate > consumes lots of Buffer (not including the buffer reserved for > ResultPartition) -> Other tasks acquire exclusive buffer for InputGate and > trigger redistribute of Buffers (Buffers taken by previous InputGate can not > be released) -> The first task of which InputGate uses lots of buffers begin > to emit records but can not acquire enough core Buffers (Some operators may > not emit records out immediately or there is just nothing to emit) -> > Deadlock. > > I think we can fix this problem by limit the number of Buffers can be > allocated by a InputGate which reads from blocking ResultPartition. -- This message was sent by Atlassian Jira (v8.3.4#803005)