[ 
https://issues.apache.org/jira/browse/FLINK-35165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17839806#comment-17839806
 ] 

Venkata krishnan Sowrirajan commented on FLINK-35165:
-----------------------------------------------------

JFYI, I am working on the fix for this issue.

> AdaptiveBatch Scheduler should not restrict the default source parallelism to 
> the max parallelism set
> -----------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-35165
>                 URL: https://issues.apache.org/jira/browse/FLINK-35165
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>            Reporter: Venkata krishnan Sowrirajan
>            Priority: Major
>
> Copy-pasting the reasoning mentioned on this [discussion 
> thread|https://lists.apache.org/thread/o887xhvvmn2rg5tyymw348yl2mqt23o7].
> Let me state why I think 
> "{_}jobmanager.adaptive-batch-scheduler.default-source-parallelism{_}" should 
> not be bound by the 
> "{_}jobmanager.adaptive-batch-scheduler.max-parallelism{_}".
>  *  Source vertex is unique and does not have any upstream vertices - 
> Downstream vertices read shuffled data partitioned by key, which is not the 
> case for the Source vertex
>  * Limiting source parallelism by downstream vertices' max parallelism is 
> incorrect
>  * If we say for ""semantic consistency" the source vertex parallelism has to 
> be bound by the overall job's max parallelism, it can lead to following 
> issues:
>  ** High filter selectivity with huge amounts of data to read
>  ** Setting high "*jobmanager.adaptive-batch-scheduler.max-parallelism*" so 
> that source parallelism can be set higher can lead to small blocks and 
> sub-optimal performance.
>  ** Setting high "*jobmanager.adaptive-batch-scheduler.max-parallelism*" 
> requires careful tuning of network buffer configurations which is unnecessary 
> in cases where it is not required just so that the source parallelism can be 
> set high.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to