Hi Ron and Lijie,
Thanks for joining the discussion and sharing your suggestions.

> the InferMode class should also be introduced in the Public Interfaces
> section!


Thanks for the reminder, I have now added the InferMode class to the Public
Interfaces section as well.

> `table.exec.hive.infer-source-parallelism.max` is 1024, I checked through
> the code that the default value is 1000?


I have checked and the default value of
`table.exec.hive.infer-source-parallelism.max` is indeed 1000. This has
been corrected in the FLIP.

> how are`table.exec.hive.infer-source-parallelism` and
> `table.exec.hive.infer-source-parallelism.mode` compatible?


This is indeed a critical point. The current plan is to deprecate
`table.exec.hive.infer-source-parallelism` but still utilize it as the main
switch for enabling automatic parallelism inference. That means, in terms
of priority, `table.exec.hive.infer-source-parallelism` >
`table.exec.hive.infer-source-parallelism.mode`. In future versions, if
`table.exec.hive.infer-source-parallelism` is removed, this logic will also
need to be revised, leaving only
`table.exec.hive.infer-source-parallelism.mode` as the basis for deciding
whether to enable parallelism inference. I have also added this description
to the FLIP.


> In FLIP-367 it is supported to be able to set the Source's parallelism
> individually, if in the future HiveSource also supports this feature,
> however, the default value of
> `table.exec.hive.infer-source-parallelism.mode` is `InferMode.DYNAMIC`, at
> this point will the parallelism be dynamically derived or will the manually
> set parallelism take effect, and who has the higher priority?


>From my understanding, 'manually set parallelism' has the higher priority,
just like one of the preconditions for the effectiveness of dynamic
parallelism inference in the AdaptiveBatchScheduler is that the vertex's
parallelism isn't set. I believe whether it's static inference or dynamic
inference, the manually set parallelism by the user should be respected.

> The `InferMode.NONE` option.

Currently, 'adding InferMode.NONE' seems to be the prevailing opinion. I
will add InferMode.NONE as one of the Enum options in InferMode class.

Best,
Xia

Lijie Wang <wangdachui9...@gmail.com> 于2024年4月18日周四 13:50写道:

> Thanks for driving the discussion.
>
> +1 for the proposal and +1 for the `InferMode.NONE` option.
>
> Best,
> Lijie
>
> Ron liu <ron9....@gmail.com> 于2024年4月18日周四 11:36写道:
>
> > Hi, Xia
> >
> > Thanks for driving this FLIP.
> >
> > This proposal looks good to me overall. However, I have the following
> minor
> > questions:
> >
> > 1. FLIP introduced `table.exec.hive.infer-source-parallelism.mode` as a
> new
> > parameter, and the value is the enum class `InferMode`, I think the
> > InferMode class should also be introduced in the Public Interfaces
> section!
> > 2. You mentioned in FLIP that the default value of
> > `table.exec.hive.infer-source-parallelism.max` is 1024, I checked through
> > the code that the default value is 1000?
> > 3. I also agree with Muhammet's idea that there is no need to introduce
> the
> > option `table.exec.hive.infer-source-parallelism.enabled`, and that
> > expanding the InferMode values will fulfill the need. There is another
> > issue to consider here though, how are
> > `table.exec.hive.infer-source-parallelism` and
> > `table.exec.hive.infer-source-parallelism.mode` compatible?
> > 4. In FLIP-367 it is supported to be able to set the Source's parallelism
> > individually, if in the future HiveSource also supports this feature,
> > however, the default value of
> > `table.exec.hive.infer-source-parallelism.mode` is `InferMode. DYNAMIC`,
> at
> > this point will the parallelism be dynamically derived or will the
> manually
> > set parallelism take effect, and who has the higher priority?
> >
> > Best,
> > Ron
> >
> > Xia Sun <xingbe...@gmail.com> 于2024年4月17日周三 12:08写道:
> >
> > > Hi Jeyhun, Muhammet,
> > > Thanks for all the feedback!
> > >
> > > > Could you please mention the default values for the new
> configurations
> > > >     (e.g., table.exec.hive.infer-source-parallelism.mode,
> > > >     table.exec.hive.infer-source-parallelism.enabled,
> > > >     etc) ?
> > >
> > >
> > > Thanks for your suggestion. I have supplemented the explanation
> regarding
> > > the default values.
> > >
> > > > Since we are introducing the mode as a configuration option,
> > > >     could it make sense to have `InferMode.NONE` option also?
> > > >     The `NONE` option would disable the inference.
> > >
> > >
> > > This is a good idea. Looking ahead, it could eliminate the need for
> > > introducing
> > > a new configuration option. I haven't identified any potential
> > > compatibility issues
> > > as yet. If there are no further ideas from others, I'll go ahead and
> > update
> > > the FLIP to
> > > introducing InferMode.NONE.
> > >
> > > Best,
> > > Xia
> > >
> > > Muhammet Orazov <mor+fl...@morazow.com.invalid> 于2024年4月17日周三 10:31写道:
> > >
> > > > Hello Xia,
> > > >
> > > > Thanks for the FLIP!
> > > >
> > > > Since we are introducing the mode as a configuration option,
> > > > could it make sense to have `InferMode.NONE` option also?
> > > > The `NONE` option would disable the inference.
> > > >
> > > > This way we deprecate the `table.exec.hive.infer-source-parallelism`
> > > > and no additional `table.exec.hive.infer-source-parallelism.enabled`
> > > > option is required.
> > > >
> > > > What do you think?
> > > >
> > > > Best,
> > > > Muhammet
> > > >
> > > > On 2024-04-16 07:07, Xia Sun wrote:
> > > > > Hi everyone,
> > > > > I would like to start a discussion on FLIP-445: Support dynamic
> > > > > parallelism
> > > > > inference for HiveSource[1].
> > > > >
> > > > > FLIP-379[2] has introduced dynamic source parallelism inference for
> > > > > batch
> > > > > jobs, which can utilize runtime information to more accurately
> decide
> > > > > the
> > > > > source parallelism. As a follow-up task, we plan to implement the
> > > > > dynamic
> > > > > parallelism inference interface for HiveSource, and also switch the
> > > > > default
> > > > > static parallelism inference to dynamic parallelism inference.
> > > > >
> > > > > Looking forward to your feedback and suggestions, thanks.
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-445%3A+Support+dynamic+parallelism+inference+for+HiveSource
> > > > > [2]
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs
> > > > >
> > > > > Best regards,
> > > > > Xia
> > > >
> > >
> >
>

Reply via email to