Venkata krishnan Sowrirajan created FLINK-35165: ---------------------------------------------------
Summary: AdaptiveBatch Scheduler should not restrict the default source parallelism to the max parallelism set Key: FLINK-35165 URL: https://issues.apache.org/jira/browse/FLINK-35165 Project: Flink Issue Type: Bug Components: Runtime / Coordination Reporter: Venkata krishnan Sowrirajan Copy-pasting the reasoning mentioned on this [discussion thread|https://lists.apache.org/thread/o887xhvvmn2rg5tyymw348yl2mqt23o7]. Let me state why I think "{_}jobmanager.adaptive-batch-scheduler.default-source-parallelism{_}" should not be bound by the "{_}jobmanager.adaptive-batch-scheduler.max-parallelism{_}". * Source vertex is unique and does not have any upstream vertices - Downstream vertices read shuffled data partitioned by key, which is not the case for the Source vertex * Limiting source parallelism by downstream vertices' max parallelism is incorrect * If we say for ""semantic consistency" the source vertex parallelism has to be bound by the overall job's max parallelism, it can lead to following issues: ** High filter selectivity with huge amounts of data to read ** Setting high "*jobmanager.adaptive-batch-scheduler.max-parallelism*" so that source parallelism can be set higher can lead to small blocks and sub-optimal performance. ** Setting high "*jobmanager.adaptive-batch-scheduler.max-parallelism*" requires careful tuning of network buffer configurations which is unnecessary in cases where it is not required just so that the source parallelism can be set high. -- This message was sent by Atlassian Jira (v8.20.10#820010)