Re: [DISCUSS] FLIP-405: Migrate string configuration key to ConfigOption

2024-01-07 Thread Xuannan Su
Hi all,

Thanks for the discussion. I think all the comments and questions have
been addressed. I will open the voting thread today.

Best,
Xuannan


On Tue, Jan 2, 2024 at 11:59 AM Xuannan Su  wrote:
>
> Hi all,
>
> Thank you for all your comments! The FLIP has been updated
> accordingly. Please let me know if you have any further questions or
> comments.
>
> Also, note that many people are on Christmas break, so we will keep
> the discussion open for another week.
>
> Best,
> Xuannan
>
> On Wed, Dec 27, 2023 at 5:20 PM Rui Fan <1996fan...@gmail.com> wrote:
> >
> > > After some investigation, it turns out those options of input/output
> > > format are only publicly exposed in the DataSet docs[2], which is
> > > deprecated. Thus, marking them as deprecated and removed in Flink 2.0
> > > looks fine to me.
> >
> > Thanks Xuannan for the detailed investigation, if so, deprecate them
> > and removing them in Flink 2.0 looks good to me.
> >
> > > I think the key of LOCAL_NUMBER_TASK_MANAGER is better as
> > > 'minicluster.number-of-taskmanagers' or 'minicluster.taskmanager-number'
> > > instead of 'minicluster.number-taskmanager'.
> >
> > Thanks Hang for the good suggestion! 'minicluster.number-of-taskmanagers'
> > sounds good to me, it's similar to taskmanager.numberOfTaskSlots.
> >
> > Best,
> > Rui
> >
> > On Wed, Dec 27, 2023 at 1:56 PM Hang Ruan  wrote:
> >>
> >> Hi, Rui Fan.
> >>
> >> Thanks for this FLIP.
> >>
> >> I think the key of LOCAL_NUMBER_TASK_MANAGER is better as
> >> 'minicluster.number-of-taskmanagers' or 'minicluster.taskmanager-number'
> >> instead of 'minicluster.number-taskmanager'.
> >>
> >> Best,
> >> Hang
> >>
> >> Xuannan Su  于2023年12月27日周三 12:40写道:
> >>
> >> > Hi Xintong and Rui,
> >> >
> >> > Thanks for the quick feedback and the suggestions.
> >> >
> >> > > 1. I think the default value for `TASK_MANAGER_LOG_PATH_KEY` should be
> >> > "no
> >> > > default".
> >> >
> >> > I have considered both ways of describing the default value. However,
> >> > I found out that some of the configurations, such as `web.tmpdir`, put
> >> > `System.getProperty()` in the default value [1]. Some are putting the
> >> > description in the default value column[2]. So I just picked the first
> >> > one. I am fine with either way, so long as they are consistent. WDYT?
> >> >
> >> > > 3. Simply saying "getting / setting value with string key is 
> >> > > discouraged"
> >> > > in JavaDoc of get/setString is IMHO a bit confusing. People may have 
> >> > > the
> >> > > question why would we keep the discouraged interfaces at all. I would
> >> > > suggest the following:
> >> > > ```
> >> > > We encourage users and developers to always use ConfigOption for 
> >> > > getting
> >> > /
> >> > > setting the configurations if possible, for its rich description, type,
> >> > > default-value and other supports. The string-key-based getter / setter
> >> > > should only be used when ConfigOption is not applicable, e.g., the key 
> >> > > is
> >> > > programmatically generated in runtime.
> >> > > ```
> >> >
> >> > The suggested comment looks good to me. Thanks for the suggestion. I
> >> > will update the comment in the FLIP.
> >> >
> >> > > 2. So I wonder if we can simply mark them as deprecated and remove in
> >> > 2.0.
> >> >
> >> > After some investigation, it turns out those options of input/output
> >> > format are only publicly exposed in the DataSet docs[2], which is
> >> > deprecated. Thus, marking them as deprecated and removed in Flink 2.0
> >> > looks fine to me.
> >> >
> >> >
> >> > @Rui
> >> >
> >> > > Configuration has a `public  T get(ConfigOption option)` method.
> >> > > Could we remove all `Xxx getXxx(ConfigOption configOption)` 
> >> > > methods?
> >> >
> >> > +1 Only keep the get(ConfigOption option),
> >> > getOptional(ConfigOption option), and set(ConfigOption option, T
> >> > value).
> >> >
> >> > Best,
> >> > Xuannan
> >> >
> >> > [1]
> >> > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#web-tmpdir
> >> > [2]
> >> > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#kubernetes-container-image-ref
> >> > [3]
> >> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/dataset/overview/#data-sources
> >> >
> >> >
> >> >
> >> >
> >> > On Tue, Dec 26, 2023 at 8:47 PM Xintong Song 
> >> > wrote:
> >> > >
> >> > > >
> >> > > > Configuration has a `public  T get(ConfigOption option)` 
> >> > > > method.
> >> > > > Could we remove all `Xxx getXxx(ConfigOption configOption)`
> >> > methods?
> >> > >
> >> > >
> >> > >
> >> > > Note: all `public void setXxx(ConfigOption key, Xxx value)` 
> >> > > methods
> >> > > > can be replaced with `public  Configuration set(ConfigOption
> >> > option,
> >> > > > T value)` as well.
> >> > >
> >> > >
> >> > > +1
> >> > >
> >> > >
> >> > > Best,
> >> > >
> >> > > Xintong
> >> > >
> >> > >
> >> > >
> >> > > On Tue, Dec 26, 2023 at 8:44 PM Xintong Song 
> >> > wrote:
> >> > >
> >> > > > These features don't have 

Re: [DISCUSS] FLIP-405: Migrate string configuration key to ConfigOption

2024-01-01 Thread Xuannan Su
Hi all,

Thank you for all your comments! The FLIP has been updated
accordingly. Please let me know if you have any further questions or
comments.

Also, note that many people are on Christmas break, so we will keep
the discussion open for another week.

Best,
Xuannan

On Wed, Dec 27, 2023 at 5:20 PM Rui Fan <1996fan...@gmail.com> wrote:
>
> > After some investigation, it turns out those options of input/output
> > format are only publicly exposed in the DataSet docs[2], which is
> > deprecated. Thus, marking them as deprecated and removed in Flink 2.0
> > looks fine to me.
>
> Thanks Xuannan for the detailed investigation, if so, deprecate them
> and removing them in Flink 2.0 looks good to me.
>
> > I think the key of LOCAL_NUMBER_TASK_MANAGER is better as
> > 'minicluster.number-of-taskmanagers' or 'minicluster.taskmanager-number'
> > instead of 'minicluster.number-taskmanager'.
>
> Thanks Hang for the good suggestion! 'minicluster.number-of-taskmanagers'
> sounds good to me, it's similar to taskmanager.numberOfTaskSlots.
>
> Best,
> Rui
>
> On Wed, Dec 27, 2023 at 1:56 PM Hang Ruan  wrote:
>>
>> Hi, Rui Fan.
>>
>> Thanks for this FLIP.
>>
>> I think the key of LOCAL_NUMBER_TASK_MANAGER is better as
>> 'minicluster.number-of-taskmanagers' or 'minicluster.taskmanager-number'
>> instead of 'minicluster.number-taskmanager'.
>>
>> Best,
>> Hang
>>
>> Xuannan Su  于2023年12月27日周三 12:40写道:
>>
>> > Hi Xintong and Rui,
>> >
>> > Thanks for the quick feedback and the suggestions.
>> >
>> > > 1. I think the default value for `TASK_MANAGER_LOG_PATH_KEY` should be
>> > "no
>> > > default".
>> >
>> > I have considered both ways of describing the default value. However,
>> > I found out that some of the configurations, such as `web.tmpdir`, put
>> > `System.getProperty()` in the default value [1]. Some are putting the
>> > description in the default value column[2]. So I just picked the first
>> > one. I am fine with either way, so long as they are consistent. WDYT?
>> >
>> > > 3. Simply saying "getting / setting value with string key is discouraged"
>> > > in JavaDoc of get/setString is IMHO a bit confusing. People may have the
>> > > question why would we keep the discouraged interfaces at all. I would
>> > > suggest the following:
>> > > ```
>> > > We encourage users and developers to always use ConfigOption for getting
>> > /
>> > > setting the configurations if possible, for its rich description, type,
>> > > default-value and other supports. The string-key-based getter / setter
>> > > should only be used when ConfigOption is not applicable, e.g., the key is
>> > > programmatically generated in runtime.
>> > > ```
>> >
>> > The suggested comment looks good to me. Thanks for the suggestion. I
>> > will update the comment in the FLIP.
>> >
>> > > 2. So I wonder if we can simply mark them as deprecated and remove in
>> > 2.0.
>> >
>> > After some investigation, it turns out those options of input/output
>> > format are only publicly exposed in the DataSet docs[2], which is
>> > deprecated. Thus, marking them as deprecated and removed in Flink 2.0
>> > looks fine to me.
>> >
>> >
>> > @Rui
>> >
>> > > Configuration has a `public  T get(ConfigOption option)` method.
>> > > Could we remove all `Xxx getXxx(ConfigOption configOption)` methods?
>> >
>> > +1 Only keep the get(ConfigOption option),
>> > getOptional(ConfigOption option), and set(ConfigOption option, T
>> > value).
>> >
>> > Best,
>> > Xuannan
>> >
>> > [1]
>> > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#web-tmpdir
>> > [2]
>> > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#kubernetes-container-image-ref
>> > [3]
>> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/dataset/overview/#data-sources
>> >
>> >
>> >
>> >
>> > On Tue, Dec 26, 2023 at 8:47 PM Xintong Song 
>> > wrote:
>> > >
>> > > >
>> > > > Configuration has a `public  T get(ConfigOption option)` method.
>> > > > Could we remove all `Xxx getXxx(ConfigOption configOption)`
>> > methods?
>> > >
>> > >
>> > >
>> > > Note: all `public void setXxx(ConfigOption key, Xxx value)` methods
>> > > > can be replaced with `public  Configuration set(ConfigOption
>> > option,
>> > > > T value)` as well.
>> > >
>> > >
>> > > +1
>> > >
>> > >
>> > > Best,
>> > >
>> > > Xintong
>> > >
>> > >
>> > >
>> > > On Tue, Dec 26, 2023 at 8:44 PM Xintong Song 
>> > wrote:
>> > >
>> > > > These features don't have a public option, but they work. I'm not sure
>> > > >> whether these features are used by some advanced users.
>> > > >> Actually, I think some of them are valuable! For example:
>> > > >>
>> > > >> - ConfigConstants.YARN_CONTAINER_START_COMMAND_TEMPLATE
>> > > >>   allows users to define the start command of the yarn container.
>> > > >> - FileInputFormat.ENUMERATE_NESTED_FILES_FLAG allows
>> > > >>   flink job reads all files under the directory even if it has nested
>> > > >> directories.
>> > > >>
>> > > >> This FLIP focuses on the 

Re: [DISCUSS] FLIP-405: Migrate string configuration key to ConfigOption

2023-12-27 Thread Rui Fan
> After some investigation, it turns out those options of input/output
> format are only publicly exposed in the DataSet docs[2], which is
> deprecated. Thus, marking them as deprecated and removed in Flink 2.0
> looks fine to me.

Thanks Xuannan for the detailed investigation, if so, deprecate them
and removing them in Flink 2.0 looks good to me.

> I think the key of LOCAL_NUMBER_TASK_MANAGER is better as
> 'minicluster.number-of-taskmanagers' or 'minicluster.taskmanager-number'
> instead of 'minicluster.number-taskmanager'.

Thanks Hang for the good suggestion! 'minicluster.number-of-taskmanagers'
sounds good to me, it's similar to taskmanager.numberOfTaskSlots.

Best,
Rui

On Wed, Dec 27, 2023 at 1:56 PM Hang Ruan  wrote:

> Hi, Rui Fan.
>
> Thanks for this FLIP.
>
> I think the key of LOCAL_NUMBER_TASK_MANAGER is better as
> 'minicluster.number-of-taskmanagers' or 'minicluster.taskmanager-number'
> instead of 'minicluster.number-taskmanager'.
>
> Best,
> Hang
>
> Xuannan Su  于2023年12月27日周三 12:40写道:
>
> > Hi Xintong and Rui,
> >
> > Thanks for the quick feedback and the suggestions.
> >
> > > 1. I think the default value for `TASK_MANAGER_LOG_PATH_KEY` should be
> > "no
> > > default".
> >
> > I have considered both ways of describing the default value. However,
> > I found out that some of the configurations, such as `web.tmpdir`, put
> > `System.getProperty()` in the default value [1]. Some are putting the
> > description in the default value column[2]. So I just picked the first
> > one. I am fine with either way, so long as they are consistent. WDYT?
> >
> > > 3. Simply saying "getting / setting value with string key is
> discouraged"
> > > in JavaDoc of get/setString is IMHO a bit confusing. People may have
> the
> > > question why would we keep the discouraged interfaces at all. I would
> > > suggest the following:
> > > ```
> > > We encourage users and developers to always use ConfigOption for
> getting
> > /
> > > setting the configurations if possible, for its rich description, type,
> > > default-value and other supports. The string-key-based getter / setter
> > > should only be used when ConfigOption is not applicable, e.g., the key
> is
> > > programmatically generated in runtime.
> > > ```
> >
> > The suggested comment looks good to me. Thanks for the suggestion. I
> > will update the comment in the FLIP.
> >
> > > 2. So I wonder if we can simply mark them as deprecated and remove in
> > 2.0.
> >
> > After some investigation, it turns out those options of input/output
> > format are only publicly exposed in the DataSet docs[2], which is
> > deprecated. Thus, marking them as deprecated and removed in Flink 2.0
> > looks fine to me.
> >
> >
> > @Rui
> >
> > > Configuration has a `public  T get(ConfigOption option)` method.
> > > Could we remove all `Xxx getXxx(ConfigOption configOption)`
> methods?
> >
> > +1 Only keep the get(ConfigOption option),
> > getOptional(ConfigOption option), and set(ConfigOption option, T
> > value).
> >
> > Best,
> > Xuannan
> >
> > [1]
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#web-tmpdir
> > [2]
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#kubernetes-container-image-ref
> > [3]
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/dataset/overview/#data-sources
> >
> >
> >
> >
> > On Tue, Dec 26, 2023 at 8:47 PM Xintong Song 
> > wrote:
> > >
> > > >
> > > > Configuration has a `public  T get(ConfigOption option)`
> method.
> > > > Could we remove all `Xxx getXxx(ConfigOption configOption)`
> > methods?
> > >
> > >
> > >
> > > Note: all `public void setXxx(ConfigOption key, Xxx value)`
> methods
> > > > can be replaced with `public  Configuration set(ConfigOption
> > option,
> > > > T value)` as well.
> > >
> > >
> > > +1
> > >
> > >
> > > Best,
> > >
> > > Xintong
> > >
> > >
> > >
> > > On Tue, Dec 26, 2023 at 8:44 PM Xintong Song 
> > wrote:
> > >
> > > > These features don't have a public option, but they work. I'm not
> sure
> > > >> whether these features are used by some advanced users.
> > > >> Actually, I think some of them are valuable! For example:
> > > >>
> > > >> - ConfigConstants.YARN_CONTAINER_START_COMMAND_TEMPLATE
> > > >>   allows users to define the start command of the yarn container.
> > > >> - FileInputFormat.ENUMERATE_NESTED_FILES_FLAG allows
> > > >>   flink job reads all files under the directory even if it has
> nested
> > > >> directories.
> > > >>
> > > >> This FLIP focuses on the refactor option, I'm afraid these features
> > are
> > > >> used
> > > >> in some production and removing these features will affect some
> flink
> > > >> jobs.
> > > >> So I prefer to keep these features, WDTY?
> > > >>
> > > >
> > > > First of all, I don't think we should support any knobs that users
> can
> > > > only learn how to use from reading Flink's internal codes. From this
> > > > perspective, for existing string-keyed knobs that are not mentioned
> in
> 

Re: [DISCUSS] FLIP-405: Migrate string configuration key to ConfigOption

2023-12-26 Thread Hang Ruan
Hi, Rui Fan.

Thanks for this FLIP.

I think the key of LOCAL_NUMBER_TASK_MANAGER is better as
'minicluster.number-of-taskmanagers' or 'minicluster.taskmanager-number'
instead of 'minicluster.number-taskmanager'.

Best,
Hang

Xuannan Su  于2023年12月27日周三 12:40写道:

> Hi Xintong and Rui,
>
> Thanks for the quick feedback and the suggestions.
>
> > 1. I think the default value for `TASK_MANAGER_LOG_PATH_KEY` should be
> "no
> > default".
>
> I have considered both ways of describing the default value. However,
> I found out that some of the configurations, such as `web.tmpdir`, put
> `System.getProperty()` in the default value [1]. Some are putting the
> description in the default value column[2]. So I just picked the first
> one. I am fine with either way, so long as they are consistent. WDYT?
>
> > 3. Simply saying "getting / setting value with string key is discouraged"
> > in JavaDoc of get/setString is IMHO a bit confusing. People may have the
> > question why would we keep the discouraged interfaces at all. I would
> > suggest the following:
> > ```
> > We encourage users and developers to always use ConfigOption for getting
> /
> > setting the configurations if possible, for its rich description, type,
> > default-value and other supports. The string-key-based getter / setter
> > should only be used when ConfigOption is not applicable, e.g., the key is
> > programmatically generated in runtime.
> > ```
>
> The suggested comment looks good to me. Thanks for the suggestion. I
> will update the comment in the FLIP.
>
> > 2. So I wonder if we can simply mark them as deprecated and remove in
> 2.0.
>
> After some investigation, it turns out those options of input/output
> format are only publicly exposed in the DataSet docs[2], which is
> deprecated. Thus, marking them as deprecated and removed in Flink 2.0
> looks fine to me.
>
>
> @Rui
>
> > Configuration has a `public  T get(ConfigOption option)` method.
> > Could we remove all `Xxx getXxx(ConfigOption configOption)` methods?
>
> +1 Only keep the get(ConfigOption option),
> getOptional(ConfigOption option), and set(ConfigOption option, T
> value).
>
> Best,
> Xuannan
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#web-tmpdir
> [2]
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#kubernetes-container-image-ref
> [3]
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/dataset/overview/#data-sources
>
>
>
>
> On Tue, Dec 26, 2023 at 8:47 PM Xintong Song 
> wrote:
> >
> > >
> > > Configuration has a `public  T get(ConfigOption option)` method.
> > > Could we remove all `Xxx getXxx(ConfigOption configOption)`
> methods?
> >
> >
> >
> > Note: all `public void setXxx(ConfigOption key, Xxx value)` methods
> > > can be replaced with `public  Configuration set(ConfigOption
> option,
> > > T value)` as well.
> >
> >
> > +1
> >
> >
> > Best,
> >
> > Xintong
> >
> >
> >
> > On Tue, Dec 26, 2023 at 8:44 PM Xintong Song 
> wrote:
> >
> > > These features don't have a public option, but they work. I'm not sure
> > >> whether these features are used by some advanced users.
> > >> Actually, I think some of them are valuable! For example:
> > >>
> > >> - ConfigConstants.YARN_CONTAINER_START_COMMAND_TEMPLATE
> > >>   allows users to define the start command of the yarn container.
> > >> - FileInputFormat.ENUMERATE_NESTED_FILES_FLAG allows
> > >>   flink job reads all files under the directory even if it has nested
> > >> directories.
> > >>
> > >> This FLIP focuses on the refactor option, I'm afraid these features
> are
> > >> used
> > >> in some production and removing these features will affect some flink
> > >> jobs.
> > >> So I prefer to keep these features, WDTY?
> > >>
> > >
> > > First of all, I don't think we should support any knobs that users can
> > > only learn how to use from reading Flink's internal codes. From this
> > > perspective, for existing string-keyed knobs that are not mentioned in
> any
> > > public documentation, yes we can argue that they are functioning, but
> we
> > > can also argue that they are not really exposed to users. That means
> > > migrating them to ConfigOption is not a pure refactor, but would make
> > > something that used to be hidden from users now exposed to users. For
> such
> > > options, I personally would lean toward not exposing them. If we
> consider
> > > them as already exposed, then process-wise there's no problem in
> > > deprecating some infrequently-used options and removing them in a major
> > > version bump, and if they are proved needed later we can add them back
> > > anytime. On the other hand, if we consider them as not yet exposed,
> then
> > > removing them later would be a breaking change.
> > >
> > >
> > > Secondly, I don't really come up with any cases where users need to
> tune
> > > these knobs. E.g., why would we allow users to customize the yarn
> container
> > > start command while we already provide `env.java.opts`? And 

Re: [DISCUSS] FLIP-405: Migrate string configuration key to ConfigOption

2023-12-26 Thread Xuannan Su
Hi Xintong and Rui,

Thanks for the quick feedback and the suggestions.

> 1. I think the default value for `TASK_MANAGER_LOG_PATH_KEY` should be "no
> default".

I have considered both ways of describing the default value. However,
I found out that some of the configurations, such as `web.tmpdir`, put
`System.getProperty()` in the default value [1]. Some are putting the
description in the default value column[2]. So I just picked the first
one. I am fine with either way, so long as they are consistent. WDYT?

> 3. Simply saying "getting / setting value with string key is discouraged"
> in JavaDoc of get/setString is IMHO a bit confusing. People may have the
> question why would we keep the discouraged interfaces at all. I would
> suggest the following:
> ```
> We encourage users and developers to always use ConfigOption for getting /
> setting the configurations if possible, for its rich description, type,
> default-value and other supports. The string-key-based getter / setter
> should only be used when ConfigOption is not applicable, e.g., the key is
> programmatically generated in runtime.
> ```

The suggested comment looks good to me. Thanks for the suggestion. I
will update the comment in the FLIP.

> 2. So I wonder if we can simply mark them as deprecated and remove in 2.0.

After some investigation, it turns out those options of input/output
format are only publicly exposed in the DataSet docs[2], which is
deprecated. Thus, marking them as deprecated and removed in Flink 2.0
looks fine to me.


@Rui

> Configuration has a `public  T get(ConfigOption option)` method.
> Could we remove all `Xxx getXxx(ConfigOption configOption)` methods?

+1 Only keep the get(ConfigOption option),
getOptional(ConfigOption option), and set(ConfigOption option, T
value).

Best,
Xuannan

[1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#web-tmpdir
[2] 
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#kubernetes-container-image-ref
[3] 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/dataset/overview/#data-sources




On Tue, Dec 26, 2023 at 8:47 PM Xintong Song  wrote:
>
> >
> > Configuration has a `public  T get(ConfigOption option)` method.
> > Could we remove all `Xxx getXxx(ConfigOption configOption)` methods?
>
>
>
> Note: all `public void setXxx(ConfigOption key, Xxx value)` methods
> > can be replaced with `public  Configuration set(ConfigOption option,
> > T value)` as well.
>
>
> +1
>
>
> Best,
>
> Xintong
>
>
>
> On Tue, Dec 26, 2023 at 8:44 PM Xintong Song  wrote:
>
> > These features don't have a public option, but they work. I'm not sure
> >> whether these features are used by some advanced users.
> >> Actually, I think some of them are valuable! For example:
> >>
> >> - ConfigConstants.YARN_CONTAINER_START_COMMAND_TEMPLATE
> >>   allows users to define the start command of the yarn container.
> >> - FileInputFormat.ENUMERATE_NESTED_FILES_FLAG allows
> >>   flink job reads all files under the directory even if it has nested
> >> directories.
> >>
> >> This FLIP focuses on the refactor option, I'm afraid these features are
> >> used
> >> in some production and removing these features will affect some flink
> >> jobs.
> >> So I prefer to keep these features, WDTY?
> >>
> >
> > First of all, I don't think we should support any knobs that users can
> > only learn how to use from reading Flink's internal codes. From this
> > perspective, for existing string-keyed knobs that are not mentioned in any
> > public documentation, yes we can argue that they are functioning, but we
> > can also argue that they are not really exposed to users. That means
> > migrating them to ConfigOption is not a pure refactor, but would make
> > something that used to be hidden from users now exposed to users. For such
> > options, I personally would lean toward not exposing them. If we consider
> > them as already exposed, then process-wise there's no problem in
> > deprecating some infrequently-used options and removing them in a major
> > version bump, and if they are proved needed later we can add them back
> > anytime. On the other hand, if we consider them as not yet exposed, then
> > removing them later would be a breaking change.
> >
> >
> > Secondly, I don't really come up with any cases where users need to tune
> > these knobs. E.g., why would we allow users to customize the yarn container
> > start command while we already provide `env.java.opts`? And what would be
> > the problem if Flink just doesn't support nested files? And even worse,
> > such knobs may provide chances for users to shoot themself in the foot.
> > E.g., what if %jvmmem% is missing from a user-provided container start
> > command? Admittedly, there might be a small fraction of advanced users that
> > know how to use these knobs. However, those users usually have their own
> > custom fork of Flink, and it should not be a big problem for them to build
> > such abilities by 

Re: [DISCUSS] FLIP-405: Migrate string configuration key to ConfigOption

2023-12-26 Thread Xintong Song
>
> Configuration has a `public  T get(ConfigOption option)` method.
> Could we remove all `Xxx getXxx(ConfigOption configOption)` methods?



Note: all `public void setXxx(ConfigOption key, Xxx value)` methods
> can be replaced with `public  Configuration set(ConfigOption option,
> T value)` as well.


+1


Best,

Xintong



On Tue, Dec 26, 2023 at 8:44 PM Xintong Song  wrote:

> These features don't have a public option, but they work. I'm not sure
>> whether these features are used by some advanced users.
>> Actually, I think some of them are valuable! For example:
>>
>> - ConfigConstants.YARN_CONTAINER_START_COMMAND_TEMPLATE
>>   allows users to define the start command of the yarn container.
>> - FileInputFormat.ENUMERATE_NESTED_FILES_FLAG allows
>>   flink job reads all files under the directory even if it has nested
>> directories.
>>
>> This FLIP focuses on the refactor option, I'm afraid these features are
>> used
>> in some production and removing these features will affect some flink
>> jobs.
>> So I prefer to keep these features, WDTY?
>>
>
> First of all, I don't think we should support any knobs that users can
> only learn how to use from reading Flink's internal codes. From this
> perspective, for existing string-keyed knobs that are not mentioned in any
> public documentation, yes we can argue that they are functioning, but we
> can also argue that they are not really exposed to users. That means
> migrating them to ConfigOption is not a pure refactor, but would make
> something that used to be hidden from users now exposed to users. For such
> options, I personally would lean toward not exposing them. If we consider
> them as already exposed, then process-wise there's no problem in
> deprecating some infrequently-used options and removing them in a major
> version bump, and if they are proved needed later we can add them back
> anytime. On the other hand, if we consider them as not yet exposed, then
> removing them later would be a breaking change.
>
>
> Secondly, I don't really come up with any cases where users need to tune
> these knobs. E.g., why would we allow users to customize the yarn container
> start command while we already provide `env.java.opts`? And what would be
> the problem if Flink just doesn't support nested files? And even worse,
> such knobs may provide chances for users to shoot themself in the foot.
> E.g., what if %jvmmem% is missing from a user-provided container start
> command? Admittedly, there might be a small fraction of advanced users that
> know how to use these knobs. However, those users usually have their own
> custom fork of Flink, and it should not be a big problem for them to build
> such abilities by themselves.
>
>
> Taking a step back, if we decide that some of the mentioned knobs are
> really useful, we should at least provide enough descriptions to help users
> understand when and how to use these options. E.g., the current description
> for yarn container command template is far from enough, which does not
> explain what the placeholders mean, what happens if some placeholders are
> missing or if an unknown placeholder is provided.
>
>
> Best,
>
> Xintong
>
>
>
> On Tue, Dec 26, 2023 at 7:39 PM Rui Fan <1996fan...@gmail.com> wrote:
>
>> In addition to what is written in FLIP, I found some methods of
>> Configuration
>> are not necessary. And I wanna hear more thoughts from all of you.
>>
>> Configuration has a `public  T get(ConfigOption option)` method.
>> Could we remove all `Xxx getXxx(ConfigOption configOption)` methods?
>> Such as:
>> - public int getInteger(ConfigOption configOption)
>> - public String getString(ConfigOption configOption)
>> - public long getLong(ConfigOption configOption)
>> - etc
>>
>> I prefer users and flink use `get(ConfigOption option)` instead of
>> `getXxx(ConfigOption configOption)` based on some reasons:
>>
>> 1. All callers can replace it directly without any extra effort.
>>   `T get(ConfigOption option)` method can replace all
>>   `Xxx getXxx(ConfigOption configOption)` methods directly.
>> 2. Callers can call get directly, and users or flink developers
>>   don't need to care about should they call getInteger or getString.
>> 3. Flink code is easier to maintain.
>> 4. `T get(ConfigOption option)` is designed later than
>> `Xxx getXxx(ConfigOption configOption)`, I guess if
>> `T get(ConfigOption option)` is designed first,
>>all `Xxx getXxx(ConfigOption configOption)` methods
>>   aren't needed.
>>
>> Note: all `public void setXxx(ConfigOption key, Xxx value)` methods
>> can be replaced with `public  Configuration set(ConfigOption option,
>> T value)` as well.
>>
>> If they can be marked as deprecated, only `public Xxx
>> getXxx(ConfigOption configOption, Xxx overrideDefault)`
>> is keeped after this FLIP. And the rest of getXxx or setXxx can be removed
>> in 2.0.
>>
>> Looking forward to everyone's feedback and suggestions, thank you!
>>
>> Best,
>> Rui
>>
>> On Tue, Dec 26, 2023 at 7:15 PM Rui 

Re: [DISCUSS] FLIP-405: Migrate string configuration key to ConfigOption

2023-12-26 Thread Xintong Song
>
> These features don't have a public option, but they work. I'm not sure
> whether these features are used by some advanced users.
> Actually, I think some of them are valuable! For example:
>
> - ConfigConstants.YARN_CONTAINER_START_COMMAND_TEMPLATE
>   allows users to define the start command of the yarn container.
> - FileInputFormat.ENUMERATE_NESTED_FILES_FLAG allows
>   flink job reads all files under the directory even if it has nested
> directories.
>
> This FLIP focuses on the refactor option, I'm afraid these features are
> used
> in some production and removing these features will affect some flink jobs.
> So I prefer to keep these features, WDTY?
>

First of all, I don't think we should support any knobs that users can only
learn how to use from reading Flink's internal codes. From this
perspective, for existing string-keyed knobs that are not mentioned in any
public documentation, yes we can argue that they are functioning, but we
can also argue that they are not really exposed to users. That means
migrating them to ConfigOption is not a pure refactor, but would make
something that used to be hidden from users now exposed to users. For such
options, I personally would lean toward not exposing them. If we consider
them as already exposed, then process-wise there's no problem in
deprecating some infrequently-used options and removing them in a major
version bump, and if they are proved needed later we can add them back
anytime. On the other hand, if we consider them as not yet exposed, then
removing them later would be a breaking change.


Secondly, I don't really come up with any cases where users need to tune
these knobs. E.g., why would we allow users to customize the yarn container
start command while we already provide `env.java.opts`? And what would be
the problem if Flink just doesn't support nested files? And even worse,
such knobs may provide chances for users to shoot themself in the foot.
E.g., what if %jvmmem% is missing from a user-provided container start
command? Admittedly, there might be a small fraction of advanced users that
know how to use these knobs. However, those users usually have their own
custom fork of Flink, and it should not be a big problem for them to build
such abilities by themselves.


Taking a step back, if we decide that some of the mentioned knobs are
really useful, we should at least provide enough descriptions to help users
understand when and how to use these options. E.g., the current description
for yarn container command template is far from enough, which does not
explain what the placeholders mean, what happens if some placeholders are
missing or if an unknown placeholder is provided.


Best,

Xintong



On Tue, Dec 26, 2023 at 7:39 PM Rui Fan <1996fan...@gmail.com> wrote:

> In addition to what is written in FLIP, I found some methods of
> Configuration
> are not necessary. And I wanna hear more thoughts from all of you.
>
> Configuration has a `public  T get(ConfigOption option)` method.
> Could we remove all `Xxx getXxx(ConfigOption configOption)` methods?
> Such as:
> - public int getInteger(ConfigOption configOption)
> - public String getString(ConfigOption configOption)
> - public long getLong(ConfigOption configOption)
> - etc
>
> I prefer users and flink use `get(ConfigOption option)` instead of
> `getXxx(ConfigOption configOption)` based on some reasons:
>
> 1. All callers can replace it directly without any extra effort.
>   `T get(ConfigOption option)` method can replace all
>   `Xxx getXxx(ConfigOption configOption)` methods directly.
> 2. Callers can call get directly, and users or flink developers
>   don't need to care about should they call getInteger or getString.
> 3. Flink code is easier to maintain.
> 4. `T get(ConfigOption option)` is designed later than
> `Xxx getXxx(ConfigOption configOption)`, I guess if
> `T get(ConfigOption option)` is designed first,
>all `Xxx getXxx(ConfigOption configOption)` methods
>   aren't needed.
>
> Note: all `public void setXxx(ConfigOption key, Xxx value)` methods
> can be replaced with `public  Configuration set(ConfigOption option,
> T value)` as well.
>
> If they can be marked as deprecated, only `public Xxx
> getXxx(ConfigOption configOption, Xxx overrideDefault)`
> is keeped after this FLIP. And the rest of getXxx or setXxx can be removed
> in 2.0.
>
> Looking forward to everyone's feedback and suggestions, thank you!
>
> Best,
> Rui
>
> On Tue, Dec 26, 2023 at 7:15 PM Rui Fan <1996fan...@gmail.com> wrote:
>
> > Thanks Xintong for the quick feedback and the good suggestions!
> >
> > > 1. I think the default value for `TASK_MANAGER_LOG_PATH_KEY` should be
> > "no
> > > default". We can explain in the description that if not configured,
> > > `System.getProperty("log.file")` will be used, but that is not a
> default
> > > value.
> >
> > Explain it in the description is fine for me.
> >
> > > 2. So I wonder if we can simply mark them as deprecated and remove in
> > 2.0.
> >
> > 

Re: [DISCUSS] FLIP-405: Migrate string configuration key to ConfigOption

2023-12-26 Thread Rui Fan
In addition to what is written in FLIP, I found some methods of
Configuration
are not necessary. And I wanna hear more thoughts from all of you.

Configuration has a `public  T get(ConfigOption option)` method.
Could we remove all `Xxx getXxx(ConfigOption configOption)` methods?
Such as:
- public int getInteger(ConfigOption configOption)
- public String getString(ConfigOption configOption)
- public long getLong(ConfigOption configOption)
- etc

I prefer users and flink use `get(ConfigOption option)` instead of
`getXxx(ConfigOption configOption)` based on some reasons:

1. All callers can replace it directly without any extra effort.
  `T get(ConfigOption option)` method can replace all
  `Xxx getXxx(ConfigOption configOption)` methods directly.
2. Callers can call get directly, and users or flink developers
  don't need to care about should they call getInteger or getString.
3. Flink code is easier to maintain.
4. `T get(ConfigOption option)` is designed later than
`Xxx getXxx(ConfigOption configOption)`, I guess if
`T get(ConfigOption option)` is designed first,
   all `Xxx getXxx(ConfigOption configOption)` methods
  aren't needed.

Note: all `public void setXxx(ConfigOption key, Xxx value)` methods
can be replaced with `public  Configuration set(ConfigOption option,
T value)` as well.

If they can be marked as deprecated, only `public Xxx
getXxx(ConfigOption configOption, Xxx overrideDefault)`
is keeped after this FLIP. And the rest of getXxx or setXxx can be removed
in 2.0.

Looking forward to everyone's feedback and suggestions, thank you!

Best,
Rui

On Tue, Dec 26, 2023 at 7:15 PM Rui Fan <1996fan...@gmail.com> wrote:

> Thanks Xintong for the quick feedback and the good suggestions!
>
> > 1. I think the default value for `TASK_MANAGER_LOG_PATH_KEY` should be
> "no
> > default". We can explain in the description that if not configured,
> > `System.getProperty("log.file")` will be used, but that is not a default
> > value.
>
> Explain it in the description is fine for me.
>
> > 2. So I wonder if we can simply mark them as deprecated and remove in
> 2.0.
>
> These features don't have a public option, but they work. I'm not sure
> whether these features are used by some advanced users.
> Actually, I think some of them are valuable! For example:
>
> - ConfigConstants.YARN_CONTAINER_START_COMMAND_TEMPLATE
>   allows users to define the start command of the yarn container.
> - FileInputFormat.ENUMERATE_NESTED_FILES_FLAG allows
>   flink job reads all files under the directory even if it has nested
> directories.
>
> This FLIP focuses on the refactor option, I'm afraid these features are
> used
> in some production and removing these features will affect some flink jobs.
> So I prefer to keep these features, WDTY?
>
> > 3. Simply saying "getting / setting value with string key is discouraged"
> > in JavaDoc of get/setString is IMHO a bit confusing. People may have the
> > question why would we keep the discouraged interfaces at all. I would
> > suggest the following:
> > ```
> > We encourage users and developers to always use ConfigOption for getting
> /
> > setting the configurations if possible, for its rich description, type,
> > default-value and other supports. The string-key-based getter / setter
> > should only be used when ConfigOption is not applicable, e.g., the key is
> > programmatically generated in runtime.
> > ```
>
> Suggested comment is good for me, and I'd like to hear the thought from
> Xuannan
> who wrote the original comment.
>
> Best,
> Rui
>
> On Tue, Dec 26, 2023 at 5:26 PM Xintong Song 
> wrote:
>
>> Thanks for the efforts, Rui and Xuannan.
>>
>> I think it's a good idea to migrate string-key configuration accesses to
>> ConfigOption-s in general.
>>
>> I have a few suggestions / questions regarding the FLIP.
>>
>> 1. I think the default value for `TASK_MANAGER_LOG_PATH_KEY` should be "no
>> default". We can explain in the description that if not configured,
>> `System.getProperty("log.file")` will be used, but that is not a default
>> value.
>>
>> 2. I wonder if the following string-keys can be simply removed? They are
>> neither set by Flink, nor documented anywhere (AFAIK) so that users know
>> how to set them. All of them were introduced a long time ago, require
>> significant knowledge and familiarity about Flink internals and low level
>> details in order to use, and some of them are even private. So I wonder if
>> we can simply mark them as deprecated and remove in 2.0.
>> - ConfigConstants.YARN_CONTAINER_START_COMMAND_TEMPLATE
>> - FileInputFormat.FILE_PARAMETER_KEY
>> - FileInputFormat.ENUMERATE_NESTED_FILES_FLAG
>> - FileOutputFormat.FILE_PARAMETER_KEY
>> - BinaryInputFormat.BLOCK_SIZE_PARAMETER_KEY
>> - BinaryOutputFormat.BLOCK_SIZE_PARAMETER_KEY
>>
>> 3. Simply saying "getting / setting value with string key is discouraged"
>> in JavaDoc of get/setString is IMHO a bit confusing. People may have the
>> question why would we keep the discouraged interfaces at 

Re: [DISCUSS] FLIP-405: Migrate string configuration key to ConfigOption

2023-12-26 Thread Rui Fan
Thanks Xintong for the quick feedback and the good suggestions!

> 1. I think the default value for `TASK_MANAGER_LOG_PATH_KEY` should be "no
> default". We can explain in the description that if not configured,
> `System.getProperty("log.file")` will be used, but that is not a default
> value.

Explain it in the description is fine for me.

> 2. So I wonder if we can simply mark them as deprecated and remove in 2.0.

These features don't have a public option, but they work. I'm not sure
whether these features are used by some advanced users.
Actually, I think some of them are valuable! For example:

- ConfigConstants.YARN_CONTAINER_START_COMMAND_TEMPLATE
  allows users to define the start command of the yarn container.
- FileInputFormat.ENUMERATE_NESTED_FILES_FLAG allows
  flink job reads all files under the directory even if it has nested
directories.

This FLIP focuses on the refactor option, I'm afraid these features are
used
in some production and removing these features will affect some flink jobs.
So I prefer to keep these features, WDTY?

> 3. Simply saying "getting / setting value with string key is discouraged"
> in JavaDoc of get/setString is IMHO a bit confusing. People may have the
> question why would we keep the discouraged interfaces at all. I would
> suggest the following:
> ```
> We encourage users and developers to always use ConfigOption for getting /
> setting the configurations if possible, for its rich description, type,
> default-value and other supports. The string-key-based getter / setter
> should only be used when ConfigOption is not applicable, e.g., the key is
> programmatically generated in runtime.
> ```

Suggested comment is good for me, and I'd like to hear the thought from
Xuannan
who wrote the original comment.

Best,
Rui

On Tue, Dec 26, 2023 at 5:26 PM Xintong Song  wrote:

> Thanks for the efforts, Rui and Xuannan.
>
> I think it's a good idea to migrate string-key configuration accesses to
> ConfigOption-s in general.
>
> I have a few suggestions / questions regarding the FLIP.
>
> 1. I think the default value for `TASK_MANAGER_LOG_PATH_KEY` should be "no
> default". We can explain in the description that if not configured,
> `System.getProperty("log.file")` will be used, but that is not a default
> value.
>
> 2. I wonder if the following string-keys can be simply removed? They are
> neither set by Flink, nor documented anywhere (AFAIK) so that users know
> how to set them. All of them were introduced a long time ago, require
> significant knowledge and familiarity about Flink internals and low level
> details in order to use, and some of them are even private. So I wonder if
> we can simply mark them as deprecated and remove in 2.0.
> - ConfigConstants.YARN_CONTAINER_START_COMMAND_TEMPLATE
> - FileInputFormat.FILE_PARAMETER_KEY
> - FileInputFormat.ENUMERATE_NESTED_FILES_FLAG
> - FileOutputFormat.FILE_PARAMETER_KEY
> - BinaryInputFormat.BLOCK_SIZE_PARAMETER_KEY
> - BinaryOutputFormat.BLOCK_SIZE_PARAMETER_KEY
>
> 3. Simply saying "getting / setting value with string key is discouraged"
> in JavaDoc of get/setString is IMHO a bit confusing. People may have the
> question why would we keep the discouraged interfaces at all. I would
> suggest the following:
> ```
> We encourage users and developers to always use ConfigOption for getting /
> setting the configurations if possible, for its rich description, type,
> default-value and other supports. The string-key-based getter / setter
> should only be used when ConfigOption is not applicable, e.g., the key is
> programmatically generated in runtime.
> ```
>
> WDYT?
>
> Best,
>
> Xintong
>
>
>
> On Tue, Dec 26, 2023 at 4:12 PM Rui Fan <1996fan...@gmail.com> wrote:
>
> > Hi all,
> >
> > Xuannan(cced) and I would like to start a discussion on FLIP-405:
> > FLIP-405: Migrate string configuration key to ConfigOption[1].
> >
> > As Flink progresses to 2.0, we want to enhance the user experience
> > with the existing configuration. In FLIP-77, Flink introduced
> ConfigOption
> > with DataType and strongly encourage users to utilize ConfigOption
> > instead of string keys for accessing and setting Flink configurations.
> > Presently, many string configuration keys have been deprecated and
> > replaced with ConfigOptions; however, some string configuration
> > keys are still in use.
> >
> > To ensure a better experience with the existing configuration in Flink
> > 2.0,
> > this FLIP will migrate all user-facing string configuration keys to
> > ConfigOptions.
> > Additionally, we want to modify the Configuration infrastructure to
> > promote the use of ConfigOption over string configuration keys
> > among developers and users. It's mentioned in a preview thread[2].
> >
> > Looking forward to everyone's feedback and suggestions, thank you!
> >
> > [1] https://cwiki.apache.org/confluence/x/6Yr5E
> > [2] https://lists.apache.org/thread/zzsf7glfcdjcjm1hfo1xdwc6jp37nb3m
> >
> > Best,
> > Rui
> >
>


Re: [DISCUSS] FLIP-405: Migrate string configuration key to ConfigOption

2023-12-26 Thread Xintong Song
Thanks for the efforts, Rui and Xuannan.

I think it's a good idea to migrate string-key configuration accesses to
ConfigOption-s in general.

I have a few suggestions / questions regarding the FLIP.

1. I think the default value for `TASK_MANAGER_LOG_PATH_KEY` should be "no
default". We can explain in the description that if not configured,
`System.getProperty("log.file")` will be used, but that is not a default
value.

2. I wonder if the following string-keys can be simply removed? They are
neither set by Flink, nor documented anywhere (AFAIK) so that users know
how to set them. All of them were introduced a long time ago, require
significant knowledge and familiarity about Flink internals and low level
details in order to use, and some of them are even private. So I wonder if
we can simply mark them as deprecated and remove in 2.0.
- ConfigConstants.YARN_CONTAINER_START_COMMAND_TEMPLATE
- FileInputFormat.FILE_PARAMETER_KEY
- FileInputFormat.ENUMERATE_NESTED_FILES_FLAG
- FileOutputFormat.FILE_PARAMETER_KEY
- BinaryInputFormat.BLOCK_SIZE_PARAMETER_KEY
- BinaryOutputFormat.BLOCK_SIZE_PARAMETER_KEY

3. Simply saying "getting / setting value with string key is discouraged"
in JavaDoc of get/setString is IMHO a bit confusing. People may have the
question why would we keep the discouraged interfaces at all. I would
suggest the following:
```
We encourage users and developers to always use ConfigOption for getting /
setting the configurations if possible, for its rich description, type,
default-value and other supports. The string-key-based getter / setter
should only be used when ConfigOption is not applicable, e.g., the key is
programmatically generated in runtime.
```

WDYT?

Best,

Xintong



On Tue, Dec 26, 2023 at 4:12 PM Rui Fan <1996fan...@gmail.com> wrote:

> Hi all,
>
> Xuannan(cced) and I would like to start a discussion on FLIP-405:
> FLIP-405: Migrate string configuration key to ConfigOption[1].
>
> As Flink progresses to 2.0, we want to enhance the user experience
> with the existing configuration. In FLIP-77, Flink introduced ConfigOption
> with DataType and strongly encourage users to utilize ConfigOption
> instead of string keys for accessing and setting Flink configurations.
> Presently, many string configuration keys have been deprecated and
> replaced with ConfigOptions; however, some string configuration
> keys are still in use.
>
> To ensure a better experience with the existing configuration in Flink
> 2.0,
> this FLIP will migrate all user-facing string configuration keys to
> ConfigOptions.
> Additionally, we want to modify the Configuration infrastructure to
> promote the use of ConfigOption over string configuration keys
> among developers and users. It's mentioned in a preview thread[2].
>
> Looking forward to everyone's feedback and suggestions, thank you!
>
> [1] https://cwiki.apache.org/confluence/x/6Yr5E
> [2] https://lists.apache.org/thread/zzsf7glfcdjcjm1hfo1xdwc6jp37nb3m
>
> Best,
> Rui
>


[DISCUSS] FLIP-405: Migrate string configuration key to ConfigOption

2023-12-26 Thread Rui Fan
Hi all,

Xuannan(cced) and I would like to start a discussion on FLIP-405:
FLIP-405: Migrate string configuration key to ConfigOption[1].

As Flink progresses to 2.0, we want to enhance the user experience
with the existing configuration. In FLIP-77, Flink introduced ConfigOption
with DataType and strongly encourage users to utilize ConfigOption
instead of string keys for accessing and setting Flink configurations.
Presently, many string configuration keys have been deprecated and
replaced with ConfigOptions; however, some string configuration
keys are still in use.

To ensure a better experience with the existing configuration in Flink 2.0,
this FLIP will migrate all user-facing string configuration keys to
ConfigOptions.
Additionally, we want to modify the Configuration infrastructure to
promote the use of ConfigOption over string configuration keys
among developers and users. It's mentioned in a preview thread[2].

Looking forward to everyone's feedback and suggestions, thank you!

[1] https://cwiki.apache.org/confluence/x/6Yr5E
[2] https://lists.apache.org/thread/zzsf7glfcdjcjm1hfo1xdwc6jp37nb3m

Best,
Rui