Zhijiang created FLINK-16645:
--------------------------------

             Summary: Limit the maximum backlogs in subpartitions for data skew 
case
                 Key: FLINK-16645
                 URL: https://issues.apache.org/jira/browse/FLINK-16645
             Project: Flink
          Issue Type: Sub-task
          Components: Runtime / Network
            Reporter: Zhijiang
             Fix For: 1.11.0


In the case of data skew, most of the buffers in partition's LocalBufferPool 
are probably requested away and accumulated in certain subpartition, which 
would increase in-flight data to slow down the barrier alignment.

We can set up a proper config to control how many backlogs are allowed for one 
subpartition. If one subpartition reaches this threshold, it will make the 
buffer pool unavailable which blocks task processing continuously. Then we can 
reduce the in-flight data for speeding up checkpoint process a bit and not 
impact on the performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to