[ https://issues.apache.org/jira/browse/FLINK-11082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740056#comment-16740056 ]
zhijiang commented on FLINK-11082: ---------------------------------- Consider another issue you mentioned, I think the higher CPU and packages are not caused by this bug. Although the backlog is increased once adding into queue, this partial buffer would not be transported by network. The available condition for transport is determined by both two factors: * Data available: Finished buffer or flush triggered. * Credit available: credit >0 or next is event. So data available condition might not be satisfied in {{PipelinedSubpartition#isAvailable()}} although the increased backlog makes credit avaialble. Another point is partial true as you said above. We always try to assign two additional credits to sender in best-effort way, then we could assume that the sender never needs wait for credit notification whenever data becomes available. But these two additional credits may come from exclusive or floating buffers, and the amount might be less than two. E.g. If the exclusive buffers are already used and inserted into queue in input channel, and the floating buffers are not enough in pool, then the sender might get one or zero addition credit when idle. Have you remembered the problem user reported is for {{PipelinedSubpartition}} or {{SpillableSubpartition}}? For pipelined case I have not thought of the issue to cause that currently. But for blocking case, I need double check the current implementation whether it might increase the total number of {{BufferResponse}} messages than before. > Increase backlog only if it is available for consumption > -------------------------------------------------------- > > Key: FLINK-11082 > URL: https://issues.apache.org/jira/browse/FLINK-11082 > Project: Flink > Issue Type: Sub-task > Components: Network > Affects Versions: 1.5.6, 1.6.3, 1.7.1, 1.8.0 > Reporter: zhijiang > Assignee: zhijiang > Priority: Major > > The backlog should indicate how many buffers are available in subpartition > for downstream's consumption. The availability is considered from two > factors. One is {{BufferConsumer}} finished, and the other is flush triggered. > In current implementation, when the {{BufferConsumer}} is added into the > subpartition, then the backlog is increased as a result, but this > {{BufferConsumer}} is not yet available for network transport. > Furthermore, the backlog would affect requesting floating buffers on > downstream side. That means some floating buffers are fetched in advance but > not be used for long time, so the floating buffers are not made use of > efficiently. > We found this scenario extremely for rebalance selector on upstream side, so > we want to change when to increase backlog by finishing {{BufferConsumer}} or > flush triggered. -- This message was sent by Atlassian JIRA (v7.6.3#76005)