[ https://issues.apache.org/jira/browse/FLINK-24578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521540#comment-17521540 ]
Piotr Nowojski commented on FLINK-24578: ---------------------------------------- As a next step in this ticket it might be a good idea to double check, if the same performance regression as from enabling the debloating is visible after manually decreasing the buffer size to a value similar as the debloated one for the given job. > Unexpected erratic load shape for channel skew load profile and ~10% > performance loss with enabled debloating > ------------------------------------------------------------------------------------------------------------- > > Key: FLINK-24578 > URL: https://issues.apache.org/jira/browse/FLINK-24578 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Checkpointing > Affects Versions: 1.14.0 > Reporter: Anton Kalashnikov > Priority: Major > Attachments: antiphaseBufferSize.png, erraticBufferSize1.png, > erraticBufferSize2.png > > > given: > The job with 5 maps(with keyBy). > All channels are remote. Parallelism is 80 > The first task produces only two keys - `indexOfThisSubtask` and > `indexOfThisSubtask + 1`. So every subTask has a constant value of active > channels(depends on hash rebalance) > Every record has an equal size and is processed for an equal time. > > when: > The buffer debloat is enabled with the default configuration. > > then: > The buffer size synchonizes on every subTask on the first map for some > reason. It can have the strong synchronization as shown on the > erraticBufferSize1 picture but usually synchronization is less explicit as on > erraticBufferSize2. > !erraticBufferSize1.png! > !erraticBufferSize2.png! > > Expected: > After the stabilization period the buffer size should be mostly constant with > small fluctuation or the different tasks should be in antiphase to each > other(when one subtask has small buffer size the another should have a big > buffer size). for example the picture antiphaseBufferSize > !antiphaseBufferSize.png! > > Unfortunatelly, it is not reproduced every time which means that this problem > can be connected to environment. But at least, it makes sense to try to > understand why we have so strange load shape when only several input channels > are active. > -- This message was sent by Atlassian Jira (v8.20.1#820001)