>
> 退订
请发送任意邮件到 user-unsubscr...@flink.apache.org 取消 订阅来自 user@flink.apache.org
邮件列表的邮件,发送到 user@flink.apache.org 是不会取消订阅的。
> 发自我的iPhone
>
>
> -- Original --
> From: Tony Wei
> Date: Tue,Mar 14,2023 1:11 PM
> To: David Anderson
> Cc: Hangxiang Yu , user
> Subject: Re: is there any detrimental side-effect if i set the max
> parallelismas 32768
>
> Hi Hangxiang, David,
>
> Thank you for your replies. Your responses are very helpful.
>
> Best regards,
> Tony Wei
>
> David Anderson mailto:dander...@apache.org>> 於
> 2023年3月14日 週二 下午12:12寫道:
> I believe there is some noticeable overhead if you are using the
> heap-based state backend, but with RocksDB I think the difference is
> negligible.
>
> David
>
> On Tue, Mar 7, 2023 at 11:10 PM Hangxiang Yu <mailto:master...@gmail.com>> wrote:
> >
> > Hi, Tony.
> > "be detrimental to performance" means that some extra space overhead of the
> > field of the key-group may influence performance.
> > As we know, Flink will write the key group as the prefix of the key to
> > speed up rescaling.
> > So the format will be like: key group | key len | key | ..
> > You could check the relationship between max parallelism and bytes of key
> > group as below:
> > --
> > max parallelism bytes of key group
> >1281
> > 32768 2
> > --
> > So I think the cost will be very small if the real key length >> 2 bytes.
> >
> > On Wed, Mar 8, 2023 at 1:06 PM Tony Wei > <mailto:tony19920...@gmail.com>> wrote:
> >>
> >> Hi experts,
> >>
> >>> Setting the maximum parallelism to a very large value can be detrimental
> >>> to performance because some state backends have to keep internal data
> >>> structures that scale with the number of key-groups (which are the
> >>> internal implementation mechanism for rescalable state).
> >>>
> >>> Changing the maximum parallelism explicitly when recovery from original
> >>> job will lead to state incompatibility.
> >>
> >>
> >> I read the section above from Flink official document [1], and I'm
> >> wondering what the detail is regarding to the side-effect.
> >>
> >> Suppose that I have a Flink SQL job with large state, large parallelism
> >> and using RocksDB as my state backend.
> >> I would like to set the max parallelism as 32768, so that I don't bother
> >> if the max parallelism can be divided by the parallelism whenever I want
> >> to scale my job,
> >> because the number of key groups will not differ too much between each
> >> subtask.
> >>
> >> I'm wondering if this is a good practice, because based on the official
> >> document it is not recommended actually.
> >> If possible, I would like to know the detail about this side-effect. Which
> >> state backend will have this issue? and Why?
> >> Please give me an advice. Thanks in advance.
> >>
> >> Best regards,
> >> Tony Wei
> >>
> >> [1]
> >> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution/parallel/#setting-the-maximum-parallelism
> >>
> >> <https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution/parallel/#setting-the-maximum-parallelism>
> >
> >
> >
> > --
> > Best,
> > Hangxiang.