Re: [DISCUSS] FLIP-405: Migrate string configuration key to ConfigOption
Hi all, Thanks for the discussion. I think all the comments and questions have been addressed. I will open the voting thread today. Best, Xuannan On Tue, Jan 2, 2024 at 11:59 AM Xuannan Su wrote: > > Hi all, > > Thank you for all your comments! The FLIP has been updated > accordingly. Please let me know if you have any further questions or > comments. > > Also, note that many people are on Christmas break, so we will keep > the discussion open for another week. > > Best, > Xuannan > > On Wed, Dec 27, 2023 at 5:20 PM Rui Fan <1996fan...@gmail.com> wrote: > > > > > After some investigation, it turns out those options of input/output > > > format are only publicly exposed in the DataSet docs[2], which is > > > deprecated. Thus, marking them as deprecated and removed in Flink 2.0 > > > looks fine to me. > > > > Thanks Xuannan for the detailed investigation, if so, deprecate them > > and removing them in Flink 2.0 looks good to me. > > > > > I think the key of LOCAL_NUMBER_TASK_MANAGER is better as > > > 'minicluster.number-of-taskmanagers' or 'minicluster.taskmanager-number' > > > instead of 'minicluster.number-taskmanager'. > > > > Thanks Hang for the good suggestion! 'minicluster.number-of-taskmanagers' > > sounds good to me, it's similar to taskmanager.numberOfTaskSlots. > > > > Best, > > Rui > > > > On Wed, Dec 27, 2023 at 1:56 PM Hang Ruan wrote: > >> > >> Hi, Rui Fan. > >> > >> Thanks for this FLIP. > >> > >> I think the key of LOCAL_NUMBER_TASK_MANAGER is better as > >> 'minicluster.number-of-taskmanagers' or 'minicluster.taskmanager-number' > >> instead of 'minicluster.number-taskmanager'. > >> > >> Best, > >> Hang > >> > >> Xuannan Su 于2023年12月27日周三 12:40写道: > >> > >> > Hi Xintong and Rui, > >> > > >> > Thanks for the quick feedback and the suggestions. > >> > > >> > > 1. I think the default value for `TASK_MANAGER_LOG_PATH_KEY` should be > >> > "no > >> > > default". > >> > > >> > I have considered both ways of describing the default value. However, > >> > I found out that some of the configurations, such as `web.tmpdir`, put > >> > `System.getProperty()` in the default value [1]. Some are putting the > >> > description in the default value column[2]. So I just picked the first > >> > one. I am fine with either way, so long as they are consistent. WDYT? > >> > > >> > > 3. Simply saying "getting / setting value with string key is > >> > > discouraged" > >> > > in JavaDoc of get/setString is IMHO a bit confusing. People may have > >> > > the > >> > > question why would we keep the discouraged interfaces at all. I would > >> > > suggest the following: > >> > > ``` > >> > > We encourage users and developers to always use ConfigOption for > >> > > getting > >> > / > >> > > setting the configurations if possible, for its rich description, type, > >> > > default-value and other supports. The string-key-based getter / setter > >> > > should only be used when ConfigOption is not applicable, e.g., the key > >> > > is > >> > > programmatically generated in runtime. > >> > > ``` > >> > > >> > The suggested comment looks good to me. Thanks for the suggestion. I > >> > will update the comment in the FLIP. > >> > > >> > > 2. So I wonder if we can simply mark them as deprecated and remove in > >> > 2.0. > >> > > >> > After some investigation, it turns out those options of input/output > >> > format are only publicly exposed in the DataSet docs[2], which is > >> > deprecated. Thus, marking them as deprecated and removed in Flink 2.0 > >> > looks fine to me. > >> > > >> > > >> > @Rui > >> > > >> > > Configuration has a `public T get(ConfigOption option)` method. > >> > > Could we remove all `Xxx getXxx(ConfigOption configOption)` > >> > > methods? > >> > > >> > +1 Only keep the get(ConfigOption option), > >> > getOptional(ConfigOption option), and set(ConfigOption option, T > >> > value). > >> > > >> > Best, > >> > Xuannan > >> > > >> > [1] > >> > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#web-tmpdir > >> > [2] > >> > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#kubernetes-container-image-ref > >> > [3] > >> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/dataset/overview/#data-sources > >> > > >> > > >> > > >> > > >> > On Tue, Dec 26, 2023 at 8:47 PM Xintong Song > >> > wrote: > >> > > > >> > > > > >> > > > Configuration has a `public T get(ConfigOption option)` > >> > > > method. > >> > > > Could we remove all `Xxx getXxx(ConfigOption configOption)` > >> > methods? > >> > > > >> > > > >> > > > >> > > Note: all `public void setXxx(ConfigOption key, Xxx value)` > >> > > methods > >> > > > can be replaced with `public Configuration set(ConfigOption > >> > option, > >> > > > T value)` as well. > >> > > > >> > > > >> > > +1 > >> > > > >> > > > >> > > Best, > >> > > > >> > > Xintong > >> > > > >> > > > >> > > > >> > > On Tue, Dec 26, 2023 at 8:44 PM Xintong Song > >> > wrote: > >> > > > >> > > > These features don't have
Re: [DISCUSS] FLIP-405: Migrate string configuration key to ConfigOption
Hi all, Thank you for all your comments! The FLIP has been updated accordingly. Please let me know if you have any further questions or comments. Also, note that many people are on Christmas break, so we will keep the discussion open for another week. Best, Xuannan On Wed, Dec 27, 2023 at 5:20 PM Rui Fan <1996fan...@gmail.com> wrote: > > > After some investigation, it turns out those options of input/output > > format are only publicly exposed in the DataSet docs[2], which is > > deprecated. Thus, marking them as deprecated and removed in Flink 2.0 > > looks fine to me. > > Thanks Xuannan for the detailed investigation, if so, deprecate them > and removing them in Flink 2.0 looks good to me. > > > I think the key of LOCAL_NUMBER_TASK_MANAGER is better as > > 'minicluster.number-of-taskmanagers' or 'minicluster.taskmanager-number' > > instead of 'minicluster.number-taskmanager'. > > Thanks Hang for the good suggestion! 'minicluster.number-of-taskmanagers' > sounds good to me, it's similar to taskmanager.numberOfTaskSlots. > > Best, > Rui > > On Wed, Dec 27, 2023 at 1:56 PM Hang Ruan wrote: >> >> Hi, Rui Fan. >> >> Thanks for this FLIP. >> >> I think the key of LOCAL_NUMBER_TASK_MANAGER is better as >> 'minicluster.number-of-taskmanagers' or 'minicluster.taskmanager-number' >> instead of 'minicluster.number-taskmanager'. >> >> Best, >> Hang >> >> Xuannan Su 于2023年12月27日周三 12:40写道: >> >> > Hi Xintong and Rui, >> > >> > Thanks for the quick feedback and the suggestions. >> > >> > > 1. I think the default value for `TASK_MANAGER_LOG_PATH_KEY` should be >> > "no >> > > default". >> > >> > I have considered both ways of describing the default value. However, >> > I found out that some of the configurations, such as `web.tmpdir`, put >> > `System.getProperty()` in the default value [1]. Some are putting the >> > description in the default value column[2]. So I just picked the first >> > one. I am fine with either way, so long as they are consistent. WDYT? >> > >> > > 3. Simply saying "getting / setting value with string key is discouraged" >> > > in JavaDoc of get/setString is IMHO a bit confusing. People may have the >> > > question why would we keep the discouraged interfaces at all. I would >> > > suggest the following: >> > > ``` >> > > We encourage users and developers to always use ConfigOption for getting >> > / >> > > setting the configurations if possible, for its rich description, type, >> > > default-value and other supports. The string-key-based getter / setter >> > > should only be used when ConfigOption is not applicable, e.g., the key is >> > > programmatically generated in runtime. >> > > ``` >> > >> > The suggested comment looks good to me. Thanks for the suggestion. I >> > will update the comment in the FLIP. >> > >> > > 2. So I wonder if we can simply mark them as deprecated and remove in >> > 2.0. >> > >> > After some investigation, it turns out those options of input/output >> > format are only publicly exposed in the DataSet docs[2], which is >> > deprecated. Thus, marking them as deprecated and removed in Flink 2.0 >> > looks fine to me. >> > >> > >> > @Rui >> > >> > > Configuration has a `public T get(ConfigOption option)` method. >> > > Could we remove all `Xxx getXxx(ConfigOption configOption)` methods? >> > >> > +1 Only keep the get(ConfigOption option), >> > getOptional(ConfigOption option), and set(ConfigOption option, T >> > value). >> > >> > Best, >> > Xuannan >> > >> > [1] >> > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#web-tmpdir >> > [2] >> > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#kubernetes-container-image-ref >> > [3] >> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/dataset/overview/#data-sources >> > >> > >> > >> > >> > On Tue, Dec 26, 2023 at 8:47 PM Xintong Song >> > wrote: >> > > >> > > > >> > > > Configuration has a `public T get(ConfigOption option)` method. >> > > > Could we remove all `Xxx getXxx(ConfigOption configOption)` >> > methods? >> > > >> > > >> > > >> > > Note: all `public void setXxx(ConfigOption key, Xxx value)` methods >> > > > can be replaced with `public Configuration set(ConfigOption >> > option, >> > > > T value)` as well. >> > > >> > > >> > > +1 >> > > >> > > >> > > Best, >> > > >> > > Xintong >> > > >> > > >> > > >> > > On Tue, Dec 26, 2023 at 8:44 PM Xintong Song >> > wrote: >> > > >> > > > These features don't have a public option, but they work. I'm not sure >> > > >> whether these features are used by some advanced users. >> > > >> Actually, I think some of them are valuable! For example: >> > > >> >> > > >> - ConfigConstants.YARN_CONTAINER_START_COMMAND_TEMPLATE >> > > >> allows users to define the start command of the yarn container. >> > > >> - FileInputFormat.ENUMERATE_NESTED_FILES_FLAG allows >> > > >> flink job reads all files under the directory even if it has nested >> > > >> directories. >> > > >> >> > > >> This FLIP focuses on the
Re: [DISCUSS] FLIP-405: Migrate string configuration key to ConfigOption
> After some investigation, it turns out those options of input/output > format are only publicly exposed in the DataSet docs[2], which is > deprecated. Thus, marking them as deprecated and removed in Flink 2.0 > looks fine to me. Thanks Xuannan for the detailed investigation, if so, deprecate them and removing them in Flink 2.0 looks good to me. > I think the key of LOCAL_NUMBER_TASK_MANAGER is better as > 'minicluster.number-of-taskmanagers' or 'minicluster.taskmanager-number' > instead of 'minicluster.number-taskmanager'. Thanks Hang for the good suggestion! 'minicluster.number-of-taskmanagers' sounds good to me, it's similar to taskmanager.numberOfTaskSlots. Best, Rui On Wed, Dec 27, 2023 at 1:56 PM Hang Ruan wrote: > Hi, Rui Fan. > > Thanks for this FLIP. > > I think the key of LOCAL_NUMBER_TASK_MANAGER is better as > 'minicluster.number-of-taskmanagers' or 'minicluster.taskmanager-number' > instead of 'minicluster.number-taskmanager'. > > Best, > Hang > > Xuannan Su 于2023年12月27日周三 12:40写道: > > > Hi Xintong and Rui, > > > > Thanks for the quick feedback and the suggestions. > > > > > 1. I think the default value for `TASK_MANAGER_LOG_PATH_KEY` should be > > "no > > > default". > > > > I have considered both ways of describing the default value. However, > > I found out that some of the configurations, such as `web.tmpdir`, put > > `System.getProperty()` in the default value [1]. Some are putting the > > description in the default value column[2]. So I just picked the first > > one. I am fine with either way, so long as they are consistent. WDYT? > > > > > 3. Simply saying "getting / setting value with string key is > discouraged" > > > in JavaDoc of get/setString is IMHO a bit confusing. People may have > the > > > question why would we keep the discouraged interfaces at all. I would > > > suggest the following: > > > ``` > > > We encourage users and developers to always use ConfigOption for > getting > > / > > > setting the configurations if possible, for its rich description, type, > > > default-value and other supports. The string-key-based getter / setter > > > should only be used when ConfigOption is not applicable, e.g., the key > is > > > programmatically generated in runtime. > > > ``` > > > > The suggested comment looks good to me. Thanks for the suggestion. I > > will update the comment in the FLIP. > > > > > 2. So I wonder if we can simply mark them as deprecated and remove in > > 2.0. > > > > After some investigation, it turns out those options of input/output > > format are only publicly exposed in the DataSet docs[2], which is > > deprecated. Thus, marking them as deprecated and removed in Flink 2.0 > > looks fine to me. > > > > > > @Rui > > > > > Configuration has a `public T get(ConfigOption option)` method. > > > Could we remove all `Xxx getXxx(ConfigOption configOption)` > methods? > > > > +1 Only keep the get(ConfigOption option), > > getOptional(ConfigOption option), and set(ConfigOption option, T > > value). > > > > Best, > > Xuannan > > > > [1] > > > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#web-tmpdir > > [2] > > > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#kubernetes-container-image-ref > > [3] > > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/dataset/overview/#data-sources > > > > > > > > > > On Tue, Dec 26, 2023 at 8:47 PM Xintong Song > > wrote: > > > > > > > > > > > Configuration has a `public T get(ConfigOption option)` > method. > > > > Could we remove all `Xxx getXxx(ConfigOption configOption)` > > methods? > > > > > > > > > > > > Note: all `public void setXxx(ConfigOption key, Xxx value)` > methods > > > > can be replaced with `public Configuration set(ConfigOption > > option, > > > > T value)` as well. > > > > > > > > > +1 > > > > > > > > > Best, > > > > > > Xintong > > > > > > > > > > > > On Tue, Dec 26, 2023 at 8:44 PM Xintong Song > > wrote: > > > > > > > These features don't have a public option, but they work. I'm not > sure > > > >> whether these features are used by some advanced users. > > > >> Actually, I think some of them are valuable! For example: > > > >> > > > >> - ConfigConstants.YARN_CONTAINER_START_COMMAND_TEMPLATE > > > >> allows users to define the start command of the yarn container. > > > >> - FileInputFormat.ENUMERATE_NESTED_FILES_FLAG allows > > > >> flink job reads all files under the directory even if it has > nested > > > >> directories. > > > >> > > > >> This FLIP focuses on the refactor option, I'm afraid these features > > are > > > >> used > > > >> in some production and removing these features will affect some > flink > > > >> jobs. > > > >> So I prefer to keep these features, WDTY? > > > >> > > > > > > > > First of all, I don't think we should support any knobs that users > can > > > > only learn how to use from reading Flink's internal codes. From this > > > > perspective, for existing string-keyed knobs that are not mentioned > in >
Re: [DISCUSS] FLIP-405: Migrate string configuration key to ConfigOption
Hi, Rui Fan. Thanks for this FLIP. I think the key of LOCAL_NUMBER_TASK_MANAGER is better as 'minicluster.number-of-taskmanagers' or 'minicluster.taskmanager-number' instead of 'minicluster.number-taskmanager'. Best, Hang Xuannan Su 于2023年12月27日周三 12:40写道: > Hi Xintong and Rui, > > Thanks for the quick feedback and the suggestions. > > > 1. I think the default value for `TASK_MANAGER_LOG_PATH_KEY` should be > "no > > default". > > I have considered both ways of describing the default value. However, > I found out that some of the configurations, such as `web.tmpdir`, put > `System.getProperty()` in the default value [1]. Some are putting the > description in the default value column[2]. So I just picked the first > one. I am fine with either way, so long as they are consistent. WDYT? > > > 3. Simply saying "getting / setting value with string key is discouraged" > > in JavaDoc of get/setString is IMHO a bit confusing. People may have the > > question why would we keep the discouraged interfaces at all. I would > > suggest the following: > > ``` > > We encourage users and developers to always use ConfigOption for getting > / > > setting the configurations if possible, for its rich description, type, > > default-value and other supports. The string-key-based getter / setter > > should only be used when ConfigOption is not applicable, e.g., the key is > > programmatically generated in runtime. > > ``` > > The suggested comment looks good to me. Thanks for the suggestion. I > will update the comment in the FLIP. > > > 2. So I wonder if we can simply mark them as deprecated and remove in > 2.0. > > After some investigation, it turns out those options of input/output > format are only publicly exposed in the DataSet docs[2], which is > deprecated. Thus, marking them as deprecated and removed in Flink 2.0 > looks fine to me. > > > @Rui > > > Configuration has a `public T get(ConfigOption option)` method. > > Could we remove all `Xxx getXxx(ConfigOption configOption)` methods? > > +1 Only keep the get(ConfigOption option), > getOptional(ConfigOption option), and set(ConfigOption option, T > value). > > Best, > Xuannan > > [1] > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#web-tmpdir > [2] > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#kubernetes-container-image-ref > [3] > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/dataset/overview/#data-sources > > > > > On Tue, Dec 26, 2023 at 8:47 PM Xintong Song > wrote: > > > > > > > > Configuration has a `public T get(ConfigOption option)` method. > > > Could we remove all `Xxx getXxx(ConfigOption configOption)` > methods? > > > > > > > > Note: all `public void setXxx(ConfigOption key, Xxx value)` methods > > > can be replaced with `public Configuration set(ConfigOption > option, > > > T value)` as well. > > > > > > +1 > > > > > > Best, > > > > Xintong > > > > > > > > On Tue, Dec 26, 2023 at 8:44 PM Xintong Song > wrote: > > > > > These features don't have a public option, but they work. I'm not sure > > >> whether these features are used by some advanced users. > > >> Actually, I think some of them are valuable! For example: > > >> > > >> - ConfigConstants.YARN_CONTAINER_START_COMMAND_TEMPLATE > > >> allows users to define the start command of the yarn container. > > >> - FileInputFormat.ENUMERATE_NESTED_FILES_FLAG allows > > >> flink job reads all files under the directory even if it has nested > > >> directories. > > >> > > >> This FLIP focuses on the refactor option, I'm afraid these features > are > > >> used > > >> in some production and removing these features will affect some flink > > >> jobs. > > >> So I prefer to keep these features, WDTY? > > >> > > > > > > First of all, I don't think we should support any knobs that users can > > > only learn how to use from reading Flink's internal codes. From this > > > perspective, for existing string-keyed knobs that are not mentioned in > any > > > public documentation, yes we can argue that they are functioning, but > we > > > can also argue that they are not really exposed to users. That means > > > migrating them to ConfigOption is not a pure refactor, but would make > > > something that used to be hidden from users now exposed to users. For > such > > > options, I personally would lean toward not exposing them. If we > consider > > > them as already exposed, then process-wise there's no problem in > > > deprecating some infrequently-used options and removing them in a major > > > version bump, and if they are proved needed later we can add them back > > > anytime. On the other hand, if we consider them as not yet exposed, > then > > > removing them later would be a breaking change. > > > > > > > > > Secondly, I don't really come up with any cases where users need to > tune > > > these knobs. E.g., why would we allow users to customize the yarn > container > > > start command while we already provide `env.java.opts`? And
Re: [DISCUSS] FLIP-405: Migrate string configuration key to ConfigOption
Hi Xintong and Rui, Thanks for the quick feedback and the suggestions. > 1. I think the default value for `TASK_MANAGER_LOG_PATH_KEY` should be "no > default". I have considered both ways of describing the default value. However, I found out that some of the configurations, such as `web.tmpdir`, put `System.getProperty()` in the default value [1]. Some are putting the description in the default value column[2]. So I just picked the first one. I am fine with either way, so long as they are consistent. WDYT? > 3. Simply saying "getting / setting value with string key is discouraged" > in JavaDoc of get/setString is IMHO a bit confusing. People may have the > question why would we keep the discouraged interfaces at all. I would > suggest the following: > ``` > We encourage users and developers to always use ConfigOption for getting / > setting the configurations if possible, for its rich description, type, > default-value and other supports. The string-key-based getter / setter > should only be used when ConfigOption is not applicable, e.g., the key is > programmatically generated in runtime. > ``` The suggested comment looks good to me. Thanks for the suggestion. I will update the comment in the FLIP. > 2. So I wonder if we can simply mark them as deprecated and remove in 2.0. After some investigation, it turns out those options of input/output format are only publicly exposed in the DataSet docs[2], which is deprecated. Thus, marking them as deprecated and removed in Flink 2.0 looks fine to me. @Rui > Configuration has a `public T get(ConfigOption option)` method. > Could we remove all `Xxx getXxx(ConfigOption configOption)` methods? +1 Only keep the get(ConfigOption option), getOptional(ConfigOption option), and set(ConfigOption option, T value). Best, Xuannan [1] https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#web-tmpdir [2] https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#kubernetes-container-image-ref [3] https://nightlies.apache.org/flink/flink-docs-master/docs/dev/dataset/overview/#data-sources On Tue, Dec 26, 2023 at 8:47 PM Xintong Song wrote: > > > > > Configuration has a `public T get(ConfigOption option)` method. > > Could we remove all `Xxx getXxx(ConfigOption configOption)` methods? > > > > Note: all `public void setXxx(ConfigOption key, Xxx value)` methods > > can be replaced with `public Configuration set(ConfigOption option, > > T value)` as well. > > > +1 > > > Best, > > Xintong > > > > On Tue, Dec 26, 2023 at 8:44 PM Xintong Song wrote: > > > These features don't have a public option, but they work. I'm not sure > >> whether these features are used by some advanced users. > >> Actually, I think some of them are valuable! For example: > >> > >> - ConfigConstants.YARN_CONTAINER_START_COMMAND_TEMPLATE > >> allows users to define the start command of the yarn container. > >> - FileInputFormat.ENUMERATE_NESTED_FILES_FLAG allows > >> flink job reads all files under the directory even if it has nested > >> directories. > >> > >> This FLIP focuses on the refactor option, I'm afraid these features are > >> used > >> in some production and removing these features will affect some flink > >> jobs. > >> So I prefer to keep these features, WDTY? > >> > > > > First of all, I don't think we should support any knobs that users can > > only learn how to use from reading Flink's internal codes. From this > > perspective, for existing string-keyed knobs that are not mentioned in any > > public documentation, yes we can argue that they are functioning, but we > > can also argue that they are not really exposed to users. That means > > migrating them to ConfigOption is not a pure refactor, but would make > > something that used to be hidden from users now exposed to users. For such > > options, I personally would lean toward not exposing them. If we consider > > them as already exposed, then process-wise there's no problem in > > deprecating some infrequently-used options and removing them in a major > > version bump, and if they are proved needed later we can add them back > > anytime. On the other hand, if we consider them as not yet exposed, then > > removing them later would be a breaking change. > > > > > > Secondly, I don't really come up with any cases where users need to tune > > these knobs. E.g., why would we allow users to customize the yarn container > > start command while we already provide `env.java.opts`? And what would be > > the problem if Flink just doesn't support nested files? And even worse, > > such knobs may provide chances for users to shoot themself in the foot. > > E.g., what if %jvmmem% is missing from a user-provided container start > > command? Admittedly, there might be a small fraction of advanced users that > > know how to use these knobs. However, those users usually have their own > > custom fork of Flink, and it should not be a big problem for them to build > > such abilities by
Re: [DISCUSS] FLIP-405: Migrate string configuration key to ConfigOption
> > Configuration has a `public T get(ConfigOption option)` method. > Could we remove all `Xxx getXxx(ConfigOption configOption)` methods? Note: all `public void setXxx(ConfigOption key, Xxx value)` methods > can be replaced with `public Configuration set(ConfigOption option, > T value)` as well. +1 Best, Xintong On Tue, Dec 26, 2023 at 8:44 PM Xintong Song wrote: > These features don't have a public option, but they work. I'm not sure >> whether these features are used by some advanced users. >> Actually, I think some of them are valuable! For example: >> >> - ConfigConstants.YARN_CONTAINER_START_COMMAND_TEMPLATE >> allows users to define the start command of the yarn container. >> - FileInputFormat.ENUMERATE_NESTED_FILES_FLAG allows >> flink job reads all files under the directory even if it has nested >> directories. >> >> This FLIP focuses on the refactor option, I'm afraid these features are >> used >> in some production and removing these features will affect some flink >> jobs. >> So I prefer to keep these features, WDTY? >> > > First of all, I don't think we should support any knobs that users can > only learn how to use from reading Flink's internal codes. From this > perspective, for existing string-keyed knobs that are not mentioned in any > public documentation, yes we can argue that they are functioning, but we > can also argue that they are not really exposed to users. That means > migrating them to ConfigOption is not a pure refactor, but would make > something that used to be hidden from users now exposed to users. For such > options, I personally would lean toward not exposing them. If we consider > them as already exposed, then process-wise there's no problem in > deprecating some infrequently-used options and removing them in a major > version bump, and if they are proved needed later we can add them back > anytime. On the other hand, if we consider them as not yet exposed, then > removing them later would be a breaking change. > > > Secondly, I don't really come up with any cases where users need to tune > these knobs. E.g., why would we allow users to customize the yarn container > start command while we already provide `env.java.opts`? And what would be > the problem if Flink just doesn't support nested files? And even worse, > such knobs may provide chances for users to shoot themself in the foot. > E.g., what if %jvmmem% is missing from a user-provided container start > command? Admittedly, there might be a small fraction of advanced users that > know how to use these knobs. However, those users usually have their own > custom fork of Flink, and it should not be a big problem for them to build > such abilities by themselves. > > > Taking a step back, if we decide that some of the mentioned knobs are > really useful, we should at least provide enough descriptions to help users > understand when and how to use these options. E.g., the current description > for yarn container command template is far from enough, which does not > explain what the placeholders mean, what happens if some placeholders are > missing or if an unknown placeholder is provided. > > > Best, > > Xintong > > > > On Tue, Dec 26, 2023 at 7:39 PM Rui Fan <1996fan...@gmail.com> wrote: > >> In addition to what is written in FLIP, I found some methods of >> Configuration >> are not necessary. And I wanna hear more thoughts from all of you. >> >> Configuration has a `public T get(ConfigOption option)` method. >> Could we remove all `Xxx getXxx(ConfigOption configOption)` methods? >> Such as: >> - public int getInteger(ConfigOption configOption) >> - public String getString(ConfigOption configOption) >> - public long getLong(ConfigOption configOption) >> - etc >> >> I prefer users and flink use `get(ConfigOption option)` instead of >> `getXxx(ConfigOption configOption)` based on some reasons: >> >> 1. All callers can replace it directly without any extra effort. >> `T get(ConfigOption option)` method can replace all >> `Xxx getXxx(ConfigOption configOption)` methods directly. >> 2. Callers can call get directly, and users or flink developers >> don't need to care about should they call getInteger or getString. >> 3. Flink code is easier to maintain. >> 4. `T get(ConfigOption option)` is designed later than >> `Xxx getXxx(ConfigOption configOption)`, I guess if >> `T get(ConfigOption option)` is designed first, >>all `Xxx getXxx(ConfigOption configOption)` methods >> aren't needed. >> >> Note: all `public void setXxx(ConfigOption key, Xxx value)` methods >> can be replaced with `public Configuration set(ConfigOption option, >> T value)` as well. >> >> If they can be marked as deprecated, only `public Xxx >> getXxx(ConfigOption configOption, Xxx overrideDefault)` >> is keeped after this FLIP. And the rest of getXxx or setXxx can be removed >> in 2.0. >> >> Looking forward to everyone's feedback and suggestions, thank you! >> >> Best, >> Rui >> >> On Tue, Dec 26, 2023 at 7:15 PM Rui
Re: [DISCUSS] FLIP-405: Migrate string configuration key to ConfigOption
> > These features don't have a public option, but they work. I'm not sure > whether these features are used by some advanced users. > Actually, I think some of them are valuable! For example: > > - ConfigConstants.YARN_CONTAINER_START_COMMAND_TEMPLATE > allows users to define the start command of the yarn container. > - FileInputFormat.ENUMERATE_NESTED_FILES_FLAG allows > flink job reads all files under the directory even if it has nested > directories. > > This FLIP focuses on the refactor option, I'm afraid these features are > used > in some production and removing these features will affect some flink jobs. > So I prefer to keep these features, WDTY? > First of all, I don't think we should support any knobs that users can only learn how to use from reading Flink's internal codes. From this perspective, for existing string-keyed knobs that are not mentioned in any public documentation, yes we can argue that they are functioning, but we can also argue that they are not really exposed to users. That means migrating them to ConfigOption is not a pure refactor, but would make something that used to be hidden from users now exposed to users. For such options, I personally would lean toward not exposing them. If we consider them as already exposed, then process-wise there's no problem in deprecating some infrequently-used options and removing them in a major version bump, and if they are proved needed later we can add them back anytime. On the other hand, if we consider them as not yet exposed, then removing them later would be a breaking change. Secondly, I don't really come up with any cases where users need to tune these knobs. E.g., why would we allow users to customize the yarn container start command while we already provide `env.java.opts`? And what would be the problem if Flink just doesn't support nested files? And even worse, such knobs may provide chances for users to shoot themself in the foot. E.g., what if %jvmmem% is missing from a user-provided container start command? Admittedly, there might be a small fraction of advanced users that know how to use these knobs. However, those users usually have their own custom fork of Flink, and it should not be a big problem for them to build such abilities by themselves. Taking a step back, if we decide that some of the mentioned knobs are really useful, we should at least provide enough descriptions to help users understand when and how to use these options. E.g., the current description for yarn container command template is far from enough, which does not explain what the placeholders mean, what happens if some placeholders are missing or if an unknown placeholder is provided. Best, Xintong On Tue, Dec 26, 2023 at 7:39 PM Rui Fan <1996fan...@gmail.com> wrote: > In addition to what is written in FLIP, I found some methods of > Configuration > are not necessary. And I wanna hear more thoughts from all of you. > > Configuration has a `public T get(ConfigOption option)` method. > Could we remove all `Xxx getXxx(ConfigOption configOption)` methods? > Such as: > - public int getInteger(ConfigOption configOption) > - public String getString(ConfigOption configOption) > - public long getLong(ConfigOption configOption) > - etc > > I prefer users and flink use `get(ConfigOption option)` instead of > `getXxx(ConfigOption configOption)` based on some reasons: > > 1. All callers can replace it directly without any extra effort. > `T get(ConfigOption option)` method can replace all > `Xxx getXxx(ConfigOption configOption)` methods directly. > 2. Callers can call get directly, and users or flink developers > don't need to care about should they call getInteger or getString. > 3. Flink code is easier to maintain. > 4. `T get(ConfigOption option)` is designed later than > `Xxx getXxx(ConfigOption configOption)`, I guess if > `T get(ConfigOption option)` is designed first, >all `Xxx getXxx(ConfigOption configOption)` methods > aren't needed. > > Note: all `public void setXxx(ConfigOption key, Xxx value)` methods > can be replaced with `public Configuration set(ConfigOption option, > T value)` as well. > > If they can be marked as deprecated, only `public Xxx > getXxx(ConfigOption configOption, Xxx overrideDefault)` > is keeped after this FLIP. And the rest of getXxx or setXxx can be removed > in 2.0. > > Looking forward to everyone's feedback and suggestions, thank you! > > Best, > Rui > > On Tue, Dec 26, 2023 at 7:15 PM Rui Fan <1996fan...@gmail.com> wrote: > > > Thanks Xintong for the quick feedback and the good suggestions! > > > > > 1. I think the default value for `TASK_MANAGER_LOG_PATH_KEY` should be > > "no > > > default". We can explain in the description that if not configured, > > > `System.getProperty("log.file")` will be used, but that is not a > default > > > value. > > > > Explain it in the description is fine for me. > > > > > 2. So I wonder if we can simply mark them as deprecated and remove in > > 2.0. > > > >
Re: [DISCUSS] FLIP-405: Migrate string configuration key to ConfigOption
In addition to what is written in FLIP, I found some methods of Configuration are not necessary. And I wanna hear more thoughts from all of you. Configuration has a `public T get(ConfigOption option)` method. Could we remove all `Xxx getXxx(ConfigOption configOption)` methods? Such as: - public int getInteger(ConfigOption configOption) - public String getString(ConfigOption configOption) - public long getLong(ConfigOption configOption) - etc I prefer users and flink use `get(ConfigOption option)` instead of `getXxx(ConfigOption configOption)` based on some reasons: 1. All callers can replace it directly without any extra effort. `T get(ConfigOption option)` method can replace all `Xxx getXxx(ConfigOption configOption)` methods directly. 2. Callers can call get directly, and users or flink developers don't need to care about should they call getInteger or getString. 3. Flink code is easier to maintain. 4. `T get(ConfigOption option)` is designed later than `Xxx getXxx(ConfigOption configOption)`, I guess if `T get(ConfigOption option)` is designed first, all `Xxx getXxx(ConfigOption configOption)` methods aren't needed. Note: all `public void setXxx(ConfigOption key, Xxx value)` methods can be replaced with `public Configuration set(ConfigOption option, T value)` as well. If they can be marked as deprecated, only `public Xxx getXxx(ConfigOption configOption, Xxx overrideDefault)` is keeped after this FLIP. And the rest of getXxx or setXxx can be removed in 2.0. Looking forward to everyone's feedback and suggestions, thank you! Best, Rui On Tue, Dec 26, 2023 at 7:15 PM Rui Fan <1996fan...@gmail.com> wrote: > Thanks Xintong for the quick feedback and the good suggestions! > > > 1. I think the default value for `TASK_MANAGER_LOG_PATH_KEY` should be > "no > > default". We can explain in the description that if not configured, > > `System.getProperty("log.file")` will be used, but that is not a default > > value. > > Explain it in the description is fine for me. > > > 2. So I wonder if we can simply mark them as deprecated and remove in > 2.0. > > These features don't have a public option, but they work. I'm not sure > whether these features are used by some advanced users. > Actually, I think some of them are valuable! For example: > > - ConfigConstants.YARN_CONTAINER_START_COMMAND_TEMPLATE > allows users to define the start command of the yarn container. > - FileInputFormat.ENUMERATE_NESTED_FILES_FLAG allows > flink job reads all files under the directory even if it has nested > directories. > > This FLIP focuses on the refactor option, I'm afraid these features are > used > in some production and removing these features will affect some flink jobs. > So I prefer to keep these features, WDTY? > > > 3. Simply saying "getting / setting value with string key is discouraged" > > in JavaDoc of get/setString is IMHO a bit confusing. People may have the > > question why would we keep the discouraged interfaces at all. I would > > suggest the following: > > ``` > > We encourage users and developers to always use ConfigOption for getting > / > > setting the configurations if possible, for its rich description, type, > > default-value and other supports. The string-key-based getter / setter > > should only be used when ConfigOption is not applicable, e.g., the key is > > programmatically generated in runtime. > > ``` > > Suggested comment is good for me, and I'd like to hear the thought from > Xuannan > who wrote the original comment. > > Best, > Rui > > On Tue, Dec 26, 2023 at 5:26 PM Xintong Song > wrote: > >> Thanks for the efforts, Rui and Xuannan. >> >> I think it's a good idea to migrate string-key configuration accesses to >> ConfigOption-s in general. >> >> I have a few suggestions / questions regarding the FLIP. >> >> 1. I think the default value for `TASK_MANAGER_LOG_PATH_KEY` should be "no >> default". We can explain in the description that if not configured, >> `System.getProperty("log.file")` will be used, but that is not a default >> value. >> >> 2. I wonder if the following string-keys can be simply removed? They are >> neither set by Flink, nor documented anywhere (AFAIK) so that users know >> how to set them. All of them were introduced a long time ago, require >> significant knowledge and familiarity about Flink internals and low level >> details in order to use, and some of them are even private. So I wonder if >> we can simply mark them as deprecated and remove in 2.0. >> - ConfigConstants.YARN_CONTAINER_START_COMMAND_TEMPLATE >> - FileInputFormat.FILE_PARAMETER_KEY >> - FileInputFormat.ENUMERATE_NESTED_FILES_FLAG >> - FileOutputFormat.FILE_PARAMETER_KEY >> - BinaryInputFormat.BLOCK_SIZE_PARAMETER_KEY >> - BinaryOutputFormat.BLOCK_SIZE_PARAMETER_KEY >> >> 3. Simply saying "getting / setting value with string key is discouraged" >> in JavaDoc of get/setString is IMHO a bit confusing. People may have the >> question why would we keep the discouraged interfaces at
Re: [DISCUSS] FLIP-405: Migrate string configuration key to ConfigOption
Thanks Xintong for the quick feedback and the good suggestions! > 1. I think the default value for `TASK_MANAGER_LOG_PATH_KEY` should be "no > default". We can explain in the description that if not configured, > `System.getProperty("log.file")` will be used, but that is not a default > value. Explain it in the description is fine for me. > 2. So I wonder if we can simply mark them as deprecated and remove in 2.0. These features don't have a public option, but they work. I'm not sure whether these features are used by some advanced users. Actually, I think some of them are valuable! For example: - ConfigConstants.YARN_CONTAINER_START_COMMAND_TEMPLATE allows users to define the start command of the yarn container. - FileInputFormat.ENUMERATE_NESTED_FILES_FLAG allows flink job reads all files under the directory even if it has nested directories. This FLIP focuses on the refactor option, I'm afraid these features are used in some production and removing these features will affect some flink jobs. So I prefer to keep these features, WDTY? > 3. Simply saying "getting / setting value with string key is discouraged" > in JavaDoc of get/setString is IMHO a bit confusing. People may have the > question why would we keep the discouraged interfaces at all. I would > suggest the following: > ``` > We encourage users and developers to always use ConfigOption for getting / > setting the configurations if possible, for its rich description, type, > default-value and other supports. The string-key-based getter / setter > should only be used when ConfigOption is not applicable, e.g., the key is > programmatically generated in runtime. > ``` Suggested comment is good for me, and I'd like to hear the thought from Xuannan who wrote the original comment. Best, Rui On Tue, Dec 26, 2023 at 5:26 PM Xintong Song wrote: > Thanks for the efforts, Rui and Xuannan. > > I think it's a good idea to migrate string-key configuration accesses to > ConfigOption-s in general. > > I have a few suggestions / questions regarding the FLIP. > > 1. I think the default value for `TASK_MANAGER_LOG_PATH_KEY` should be "no > default". We can explain in the description that if not configured, > `System.getProperty("log.file")` will be used, but that is not a default > value. > > 2. I wonder if the following string-keys can be simply removed? They are > neither set by Flink, nor documented anywhere (AFAIK) so that users know > how to set them. All of them were introduced a long time ago, require > significant knowledge and familiarity about Flink internals and low level > details in order to use, and some of them are even private. So I wonder if > we can simply mark them as deprecated and remove in 2.0. > - ConfigConstants.YARN_CONTAINER_START_COMMAND_TEMPLATE > - FileInputFormat.FILE_PARAMETER_KEY > - FileInputFormat.ENUMERATE_NESTED_FILES_FLAG > - FileOutputFormat.FILE_PARAMETER_KEY > - BinaryInputFormat.BLOCK_SIZE_PARAMETER_KEY > - BinaryOutputFormat.BLOCK_SIZE_PARAMETER_KEY > > 3. Simply saying "getting / setting value with string key is discouraged" > in JavaDoc of get/setString is IMHO a bit confusing. People may have the > question why would we keep the discouraged interfaces at all. I would > suggest the following: > ``` > We encourage users and developers to always use ConfigOption for getting / > setting the configurations if possible, for its rich description, type, > default-value and other supports. The string-key-based getter / setter > should only be used when ConfigOption is not applicable, e.g., the key is > programmatically generated in runtime. > ``` > > WDYT? > > Best, > > Xintong > > > > On Tue, Dec 26, 2023 at 4:12 PM Rui Fan <1996fan...@gmail.com> wrote: > > > Hi all, > > > > Xuannan(cced) and I would like to start a discussion on FLIP-405: > > FLIP-405: Migrate string configuration key to ConfigOption[1]. > > > > As Flink progresses to 2.0, we want to enhance the user experience > > with the existing configuration. In FLIP-77, Flink introduced > ConfigOption > > with DataType and strongly encourage users to utilize ConfigOption > > instead of string keys for accessing and setting Flink configurations. > > Presently, many string configuration keys have been deprecated and > > replaced with ConfigOptions; however, some string configuration > > keys are still in use. > > > > To ensure a better experience with the existing configuration in Flink > > 2.0, > > this FLIP will migrate all user-facing string configuration keys to > > ConfigOptions. > > Additionally, we want to modify the Configuration infrastructure to > > promote the use of ConfigOption over string configuration keys > > among developers and users. It's mentioned in a preview thread[2]. > > > > Looking forward to everyone's feedback and suggestions, thank you! > > > > [1] https://cwiki.apache.org/confluence/x/6Yr5E > > [2] https://lists.apache.org/thread/zzsf7glfcdjcjm1hfo1xdwc6jp37nb3m > > > > Best, > > Rui > > >
Re: [DISCUSS] FLIP-405: Migrate string configuration key to ConfigOption
Thanks for the efforts, Rui and Xuannan. I think it's a good idea to migrate string-key configuration accesses to ConfigOption-s in general. I have a few suggestions / questions regarding the FLIP. 1. I think the default value for `TASK_MANAGER_LOG_PATH_KEY` should be "no default". We can explain in the description that if not configured, `System.getProperty("log.file")` will be used, but that is not a default value. 2. I wonder if the following string-keys can be simply removed? They are neither set by Flink, nor documented anywhere (AFAIK) so that users know how to set them. All of them were introduced a long time ago, require significant knowledge and familiarity about Flink internals and low level details in order to use, and some of them are even private. So I wonder if we can simply mark them as deprecated and remove in 2.0. - ConfigConstants.YARN_CONTAINER_START_COMMAND_TEMPLATE - FileInputFormat.FILE_PARAMETER_KEY - FileInputFormat.ENUMERATE_NESTED_FILES_FLAG - FileOutputFormat.FILE_PARAMETER_KEY - BinaryInputFormat.BLOCK_SIZE_PARAMETER_KEY - BinaryOutputFormat.BLOCK_SIZE_PARAMETER_KEY 3. Simply saying "getting / setting value with string key is discouraged" in JavaDoc of get/setString is IMHO a bit confusing. People may have the question why would we keep the discouraged interfaces at all. I would suggest the following: ``` We encourage users and developers to always use ConfigOption for getting / setting the configurations if possible, for its rich description, type, default-value and other supports. The string-key-based getter / setter should only be used when ConfigOption is not applicable, e.g., the key is programmatically generated in runtime. ``` WDYT? Best, Xintong On Tue, Dec 26, 2023 at 4:12 PM Rui Fan <1996fan...@gmail.com> wrote: > Hi all, > > Xuannan(cced) and I would like to start a discussion on FLIP-405: > FLIP-405: Migrate string configuration key to ConfigOption[1]. > > As Flink progresses to 2.0, we want to enhance the user experience > with the existing configuration. In FLIP-77, Flink introduced ConfigOption > with DataType and strongly encourage users to utilize ConfigOption > instead of string keys for accessing and setting Flink configurations. > Presently, many string configuration keys have been deprecated and > replaced with ConfigOptions; however, some string configuration > keys are still in use. > > To ensure a better experience with the existing configuration in Flink > 2.0, > this FLIP will migrate all user-facing string configuration keys to > ConfigOptions. > Additionally, we want to modify the Configuration infrastructure to > promote the use of ConfigOption over string configuration keys > among developers and users. It's mentioned in a preview thread[2]. > > Looking forward to everyone's feedback and suggestions, thank you! > > [1] https://cwiki.apache.org/confluence/x/6Yr5E > [2] https://lists.apache.org/thread/zzsf7glfcdjcjm1hfo1xdwc6jp37nb3m > > Best, > Rui >
[DISCUSS] FLIP-405: Migrate string configuration key to ConfigOption
Hi all, Xuannan(cced) and I would like to start a discussion on FLIP-405: FLIP-405: Migrate string configuration key to ConfigOption[1]. As Flink progresses to 2.0, we want to enhance the user experience with the existing configuration. In FLIP-77, Flink introduced ConfigOption with DataType and strongly encourage users to utilize ConfigOption instead of string keys for accessing and setting Flink configurations. Presently, many string configuration keys have been deprecated and replaced with ConfigOptions; however, some string configuration keys are still in use. To ensure a better experience with the existing configuration in Flink 2.0, this FLIP will migrate all user-facing string configuration keys to ConfigOptions. Additionally, we want to modify the Configuration infrastructure to promote the use of ConfigOption over string configuration keys among developers and users. It's mentioned in a preview thread[2]. Looking forward to everyone's feedback and suggestions, thank you! [1] https://cwiki.apache.org/confluence/x/6Yr5E [2] https://lists.apache.org/thread/zzsf7glfcdjcjm1hfo1xdwc6jp37nb3m Best, Rui