[ https://issues.apache.org/jira/browse/FLINK-25688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Piotr Nowojski updated FLINK-25688: ----------------------------------- Description: As documented in FLINK-25646, currently buffer debloating in Flink, at least in the default configuration, has quite noticeable performance degradation at larger scale. For example throughput can drop by a factor of 4, or even checkpointing times can be increased. Currently it's not clear why is this happening. It looks like increasing the number of buffers per channel from the default ~2 to above 3 (for example via bumping number of floating buffers to value equal or higher then parallelism), seems to be solving this problem, at least on one cluster where buffer debloating has been tested at large scale. Maybe a solution is to change the default Flink's configuration by increasing the amount of exclusive or floating buffers, maybe at least if the buffer debloating is enabled. However further investigation is required. CC [~akalashnikov] was: As documented in FLINK-25646, currently buffer debloating in Flink, at least in the default configuration, has quite noticeable performance degradation at larger scale. For example throughput can drop by a factor of 4, or even checkpointing times can be increased. Currently it's not clear why is this happening. It looks like increasing the number of buffers per channel from the default ~2 to above 3 (for example via bumping number of floating buffers to value equal or higher then parallelism), seems to be solving this problem, at least on one cluster where buffer debloating has been tested at large scale. Further investigation is required. CC [~akalashnikov] > Resolve performance degradation with high parallelism when using buffer > debloating > ---------------------------------------------------------------------------------- > > Key: FLINK-25688 > URL: https://issues.apache.org/jira/browse/FLINK-25688 > Project: Flink > Issue Type: Improvement > Components: Runtime / Network > Affects Versions: 1.15.0, 1.14.3 > Reporter: Piotr Nowojski > Priority: Not a Priority > > As documented in FLINK-25646, currently buffer debloating in Flink, at least > in the default configuration, has quite noticeable performance degradation at > larger scale. For example throughput can drop by a factor of 4, or even > checkpointing times can be increased. Currently it's not clear why is this > happening. It looks like increasing the number of buffers per channel from > the default ~2 to above 3 (for example via bumping number of floating buffers > to value equal or higher then parallelism), seems to be solving this problem, > at least on one cluster where buffer debloating has been tested at large > scale. > Maybe a solution is to change the default Flink's configuration by increasing > the amount of exclusive or floating buffers, maybe at least if the buffer > debloating is enabled. However further investigation is required. > CC [~akalashnikov] -- This message was sent by Atlassian Jira (v8.20.1#820001)