[ https://issues.apache.org/jira/browse/FLINK-14472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956654#comment-16956654 ]
zhijiang commented on FLINK-14472: ---------------------------------- Thanks for concerning on this issue. You are right that some known scenarios are invalid for existing back pressure monitor. Although the motivation of this ticket is not for solving that limitation, I think we might solve it meanwhile while implementing the new monitor way. The current monitor way is heavy-weight and fragile, and it also needs to understand the implementation of `LocalBufferPool` which is bad in design. I tried to provide a transparent method in `BufferProvider` to indicate whether it is back pressured or not, then the monitor caller would rely on this method to get the back pressure result. It is no need to analyze the specific thread stacks inside monitor tracker to understand the implementation of `BufferProvider`. And it also has the benefit for the restful call to only carry light-weight info. > Implement back-pressure monitor with non-blocking outputs > --------------------------------------------------------- > > Key: FLINK-14472 > URL: https://issues.apache.org/jira/browse/FLINK-14472 > Project: Flink > Issue Type: Task > Components: Runtime / Network > Reporter: zhijiang > Assignee: Yingjie Cao > Priority: Minor > Fix For: 1.10.0 > > > Currently back-pressure monitor relies on detecting task threads that are > stuck in `requestBufferBuilderBlocking`. There are actually two cases to > cause back-pressure ATM: > * There are no available buffers in `LocalBufferPool` and all the given > quotas from global pool are also exhausted. Then we need to wait for buffer > recycling to `LocalBufferPool`. > * No available buffers in `LocalBufferPool`, but the quota has not been used > up. While requesting buffer from global pool, it is blocked because of no > available buffers in global pool. Then we need to wait for buffer recycling > to global pool. > We already implemented the non-blocking output for the first case in > [FLINK-14396|https://issues.apache.org/jira/browse/FLINK-14396], and we > expect the second case done together with adjusting the back-pressure monitor > which could check for `RecordWriter#isAvailable` instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)