[ https://issues.apache.org/jira/browse/FLINK-11082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739385#comment-16739385 ]
Piotr Nowojski commented on FLINK-11082: ---------------------------------------- Another issue. Could this bug might explain why one user was recently reporting higher CPU usage and 300% increase in number of packets being sent between the nodes after upgrading from Flink 1.4? Previously we were aware that credit base flow control increases the network traffic/number of messages sent between nodes by 100%. But if we announce the fresh partial buffers immediately to the receiver, could it be that the small chunk of that data is being sent prematurely, before {{flushRequested}} or next {{BufferConsumer}} is enqueued? Sending chunk of data prematurely and assigning new credit would explain the remaining unaccounted "200%" number of messages being sent. Btw, [~zjwang] if channel is idle, two exclusive buffers will be assigned to the sender and he will have some buffers for immediate use whenever the channel becomes active? > Increase backlog only if it is available for consumption > -------------------------------------------------------- > > Key: FLINK-11082 > URL: https://issues.apache.org/jira/browse/FLINK-11082 > Project: Flink > Issue Type: Sub-task > Components: Network > Affects Versions: 1.5.6, 1.6.3, 1.7.1, 1.8.0 > Reporter: zhijiang > Assignee: zhijiang > Priority: Major > > The backlog should indicate how many buffers are available in subpartition > for downstream's consumption. The availability is considered from two > factors. One is {{BufferConsumer}} finished, and the other is flush triggered. > In current implementation, when the {{BufferConsumer}} is added into the > subpartition, then the backlog is increased as a result, but this > {{BufferConsumer}} is not yet available for network transport. > Furthermore, the backlog would affect requesting floating buffers on > downstream side. That means some floating buffers are fetched in advance but > not be used for long time, so the floating buffers are not made use of > efficiently. > We found this scenario extremely for rebalance selector on upstream side, so > we want to change when to increase backlog by finishing {{BufferConsumer}} or > flush triggered. -- This message was sent by Atlassian JIRA (v7.6.3#76005)