[ 
https://issues.apache.org/jira/browse/FLINK-11082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16735815#comment-16735815
 ] 

Piotr Nowojski commented on FLINK-11082:
----------------------------------------

Hmmm, I think I’m starting to understand the issue. Currently, credit based 
model tries always to maintain 2 spare buffers per remote channel. If there are 
3 buffers in the backlog that are already used, it will try to acquire and 
assign 3 floating buffers on top of the 2 exclusive buffers for that channel.

That, combined with empty `BufferConsumers` bumping the backlog to 1, means 
that floating buffers are useless - they are always assigned to somewhere 
(completely randomly) even on very low throughputs. Or even with no throughput 
at all. If output flushing is disabled and we suddenly freeze production of 
records for couple of minutes, no data will be send, input queues will be 
empty, yet because of those “empty” enqueued `BufferConsumers` bumping the 
backlog to 1, all floating buffers will be assigned & frozen/wasted somewhere.

While original intention was to assign floating buffers to “heavily” used 
channels and this doesn’t happen right now?

> Increase backlog only if it is available for consumption
> --------------------------------------------------------
>
>                 Key: FLINK-11082
>                 URL: https://issues.apache.org/jira/browse/FLINK-11082
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Network
>    Affects Versions: 1.8.0
>            Reporter: zhijiang
>            Assignee: zhijiang
>            Priority: Minor
>
> The backlog should indicate how many buffers are available in subpartition 
> for downstream's  consumption. The availability is considered from two 
> factors. One is {{BufferConsumer}} finished, and the other is flush triggered.
> In current implementation, when the {{BufferConsumer}} is added into the 
> subpartition, then the backlog is increased as a result, but this 
> {{BufferConsumer}} is not yet available for network transport.
> Furthermore, the backlog would affect requesting floating buffers on 
> downstream side. That means some floating buffers are fetched in advance but 
> not be used for long time, so the floating buffers are not made use of 
> efficiently.
> We found this scenario extremely for rebalance selector on upstream side, so 
> we want to change when to increase backlog by finishing {{BufferConsumer}} or 
> flush triggered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to