[ 
https://issues.apache.org/jira/browse/FLINK-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhijiang updated FLINK-8523:
----------------------------
    Comment: was deleted

(was: Hey [~pnowojski], [~NicoK]  Glad to see we come back to this issue again.

I think I understand your concerns completely, and actually there are two 
separate issues to be confirmed: 

1. Whether to spill intermediate buffers before barrier alignment?

If spilling the following buffers for blocked channel which already received 
barrier as before, we can free more floating buffer resources which may be used 
for other unblocked channels. From this point, it seems get benefit for barrier 
alignment. But the only concern is that it brings additional IO cost during 
spilling/replaying intermediate buffers. If the alignment is very fast which 
means only few intermediate buffers need to be spilled, and they may still 
exist in OS cache, so the cost can be ignored. But if the spilled data is very 
huge in IO sensitive environment, it will greatly hurt the performance in TPS.

If not spilling as current codes, the only concern is that we can not make 
fully use of floating buffers before alignment, and it may delay the barrier 
alignment in some scenarios. 

So based on above analysis, no matter which way we take, it both has good 
points and bad points, and the behaviors may be different in various scenarios. 
In non-credit-based mode, we have to spill the data to avoid the deadlock, but 
now we have the chance to avoid the spill to try to make it better. And it 
seems better to not involve in any disk IO operation for stream job in runtime 
stack. From this point, I prefer to the way of not spilling. Maybe we need more 
tests, feedback or thinking for the final decision.

2. Avoid requesting floating buffers for blocked channels

I think we can reach an agreement in this issue. No matter what is the 
conclusion of first issue. it is reasonable and can get determined benefit for 
doing this. And this JIRA is focusing on this issue.

 

BTW, we ever made another improvement for speeding barrier alignment, that is 
reading unblocked channels in first priority instead of current random 
mode(FIFO based on network receiving). And it indeeds improve a log in barrier 
alignment aspect, because the task will not select unused intermediate buffers 
any more before alignment. But this selection may also change the original back 
pressure behavior and effect the performance in some scenarios. So it may be 
also a trade off.)

> Stop assigning floating buffers for blocked input channels in exactly-once 
> mode
> -------------------------------------------------------------------------------
>
>                 Key: FLINK-8523
>                 URL: https://issues.apache.org/jira/browse/FLINK-8523
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Network
>    Affects Versions: 1.5.0, 1.6.0
>            Reporter: zhijiang
>            Assignee: zhijiang
>            Priority: Major
>              Labels: pull-request-available
>
> In exactly-once mode, the input channel is set blocked state when reading 
> barrier from it. And the blocked state will be released after barrier 
> alignment or cancelled.
>  
> In credit-based network flow control, we should avoid assigning floating 
> buffers for blocked input channels because the buffers after barrier will not 
> be processed by operator until alignment.
> To do so, we can fully make use of floating buffers and speed up barrier 
> alignment in some extent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to