[ 
https://issues.apache.org/jira/browse/FLINK-16641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17350801#comment-17350801
 ] 

Piotr Nowojski commented on FLINK-16641:
----------------------------------------

Hey [~kevin.cyj], sorry for the delay. I'm currently busy with investigation of 
some performance issue. Once I get over with that, I will get back to this 
topic.

> Announce sender's backlog to solve the deadlock issue without exclusive 
> buffers
> -------------------------------------------------------------------------------
>
>                 Key: FLINK-16641
>                 URL: https://issues.apache.org/jira/browse/FLINK-16641
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Network
>            Reporter: Zhijiang
>            Assignee: Yingjie Cao
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.14.0
>
>
> This is the second ingredient besides FLINK-16404 to solve the deadlock 
> problem without exclusive buffers.
> The scenario is as follows:
>  * The data in subpartition with positive backlog can be sent without doubt 
> because the exclusive credits would be feedback finally.
>  * Without exclusive buffers, the receiver would not request floating buffers 
> for 0 backlog. But when the new backlog is added into such subpartition, it 
> has no way to notify the receiver side without positive credits ATM.
>  * So it would result in waiting for each other between receiver and sender 
> sides to cause deadlock. The sender waits for credit to notify backlog and 
> the receiver waits for backlog to request floating credits.
> To solve the above problem, the sender needs a separate message to announce 
> backlog sometimes besides existing `BufferResponse`. Then the receiver can 
> get this info to request floating buffers to feedback.
> The side effect brought is to increase network transport delay and throughput 
> regression. We can measure how much it effects in existing micro-benchmark. 
> It might probably bear this effect to get a benefit of fast checkpoint 
> without exclusive buffers. We can give the proper explanations in respective 
> configuration options to let users make the final decision in practice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to