[ 
https://issues.apache.org/jira/browse/FLINK-11082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740056#comment-16740056
 ] 

zhijiang commented on FLINK-11082:
----------------------------------

Consider another issue you mentioned, I think the higher CPU and packages are 
not caused by this bug.

Although the backlog is increased once adding into queue, this partial buffer 
would not be transported by network. The available condition for transport is 
determined by both two factors:
 * Data available: Finished buffer or flush triggered.
 * Credit available: credit >0 or next is event.

So data available condition might not be satisfied in 
{{PipelinedSubpartition#isAvailable()}} although the increased backlog makes 
credit avaialble.

Another point is partial true as you said above. We always try to assign two 
additional credits to sender in best-effort way, then we could assume that the 
sender never needs wait for credit notification whenever data becomes 
available. But these two additional credits may come from exclusive or floating 
buffers, and the amount might be less than two. E.g. If the exclusive buffers 
are already used and inserted into queue in input channel, and the floating 
buffers are not enough in pool, then the sender might get one or zero addition 
credit when idle.

Have you remembered the problem user reported is for {{PipelinedSubpartition}} 
or {{SpillableSubpartition}}? For pipelined case I have not thought of the 
issue to cause that currently. But for blocking case, I need double check the 
current implementation whether it might increase the total number of 
{{BufferResponse}} messages than before.

> Increase backlog only if it is available for consumption
> --------------------------------------------------------
>
>                 Key: FLINK-11082
>                 URL: https://issues.apache.org/jira/browse/FLINK-11082
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Network
>    Affects Versions: 1.5.6, 1.6.3, 1.7.1, 1.8.0
>            Reporter: zhijiang
>            Assignee: zhijiang
>            Priority: Major
>
> The backlog should indicate how many buffers are available in subpartition 
> for downstream's  consumption. The availability is considered from two 
> factors. One is {{BufferConsumer}} finished, and the other is flush triggered.
> In current implementation, when the {{BufferConsumer}} is added into the 
> subpartition, then the backlog is increased as a result, but this 
> {{BufferConsumer}} is not yet available for network transport.
> Furthermore, the backlog would affect requesting floating buffers on 
> downstream side. That means some floating buffers are fetched in advance but 
> not be used for long time, so the floating buffers are not made use of 
> efficiently.
> We found this scenario extremely for rebalance selector on upstream side, so 
> we want to change when to increase backlog by finishing {{BufferConsumer}} or 
> flush triggered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to