[ 
https://issues.apache.org/jira/browse/FLINK-33940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800444#comment-17800444
 ] 

Zhanghao Chen commented on FLINK-33940:
---------------------------------------

Thanks [~gyfora] for pointing that. We also observe some data skew introduced 
by imbalanced key group assignment in our production use, using highly 
composite numbers should be useful. How about using the following formula: 
{{{}min(max({*}roundUpToHighlyCompositeNum{*}(operatorParallelism * {*}5{*}), 
{*}840{*}), {*}45360{*}){}}}, where the highly composite number series is 
defined in [A002182 - OEIS|https://oeis.org/A002182]?

> Update the auto-derivation rule of max parallelism for enlarged upscaling 
> space
> -------------------------------------------------------------------------------
>
>                 Key: FLINK-33940
>                 URL: https://issues.apache.org/jira/browse/FLINK-33940
>             Project: Flink
>          Issue Type: Improvement
>          Components: API / Core
>            Reporter: Zhanghao Chen
>            Assignee: Zhanghao Chen
>            Priority: Major
>
> *Background*
> The choice of the max parallelism of an stateful operator is important as it 
> limits the upper bound of the parallelism of the opeartor while it can also 
> add extra overhead when being set too large. Currently, the max parallelism 
> of an opeartor is either fixed to a value specified by API core / pipeline 
> option or auto-derived with the following rules:
> {{min(max(roundUpToPowerOfTwo(operatorParallelism * 1.5), 128), 32767)}}
> *Problem*
> Recently, the elasticity of Flink jobs is becoming more and more valued by 
> users. The current auto-derived max parallelism was introduced a time time 
> ago and only allows the operator parallelism to be roughly doubled, which is 
> not desired for elasticity. Setting an max parallelism manually may not be 
> desired as well: users may not have the sufficient expertise to select a good 
> max-parallelism value.
> *Proposal*
> Update the auto-derivation rule of max parallelism to derive larger max 
> parallelism for better elasticity experience out of the box. A candidate is 
> as follows:
> {{min(max(roundUpToPowerOfTwo(operatorParallelism * {*}5{*}), {*}1024{*}), 
> 32767)}}
> Looking forward to your opinions on this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to