Hi everyone,

It seems we still don't have a consensus on the rules for boolean type
options. Let me recap the alternatives we have:

Option 1: Use enumeration options instead if possible. But this may cause
some name collisions or confusion as we discussed and we should unify the
statement everywhere.
Option 2: Use boolean options and add 'enabled' as the suffix.
Option 3: Use boolean options and ONLY add 'enabled' when there are more
detailed configurations under the same prefix, to prevent one name from
serving as a prefix to another.

I am inclined to Option 3, since it is more in line with current practice
and friendly for existing users. Also It reduces the length of
configuration names as much as possible.

Looking forward to your opinions! Thanks!


Best,
Zakelly

On Wed, Jan 10, 2024 at 3:30 PM Zakelly Lan <zakelly....@gmail.com> wrote:

> Hi Hangxiang,
>
> Thanks for your suggestions!
>
> 1. Could execution.recovery also contain some other behaviors about
>> recovery ? e.g. restart-strategy.
>
>
> That's a very good point. I realize that the word 'recovery' means way too
> many things. So I suggest picking a more specific word here, how about
> 'execution.state-recovery.*' ? Checkpointing and state recovery are
> corresponding terms and won't make ambiguity.
>
> 2. Could we also remove some legacy configuration value ? e.g. LEGACY Mode
>> for execution.savepoint-restore-mode/execution.recovery.claim-mode.
>
>
> I think we could create another FLIP for the deprecation of LEGACY mode.
>
>
>> 3. Could the local checkpoint be cleaned
>> if execution.checkpointing.local-copy.enabled is true and
>> execution.recovery.from-local is false ? I found it's also an issue if
>> current local-recovery from enabled to disabled. Maybe another ticket is
>> needed.
>
>
> IIUC, there is no clear ownership of the local copy files from the
> previous job and it's better to define one. This needs more discussion so
> we could create another thread for this. WDYT?
>
>
> Best,
> Zakelly
>
> On Tue, Jan 9, 2024 at 11:23 AM Hangxiang Yu <master...@gmail.com> wrote:
>
>> Hi, Zakelly.
>> Thanks for driving this. Overall LGTM as we discussed offline.
>>
>> Some comments/suggestions just came to mind:
>> 1. Could execution.recovery also contain some other behaviors about
>> recovery ? e.g. restart-strategy.
>> 2. Could we also remove some legacy configuration value ? e.g. LEGACY Mode
>> for execution.savepoint-restore-mode/execution.recovery.claim-mode.
>> 3. Could the local checkpoint be cleaned
>> if execution.checkpointing.local-copy.enabled is true and
>> execution.recovery.from-local is false ? I found it's also an issue if
>> current local-recovery from enabled to disabled. Maybe another ticket is
>> needed.
>> 4. +1 for enabling execution.checkpointing.incremental by default which is
>> basically default configuration in our production environment.
>>
>>
>> On Mon, Jan 8, 2024 at 6:06 PM Zakelly Lan <zakelly....@gmail.com> wrote:
>>
>> > Hi Yun,
>> >
>> > Thanks for your comments!
>> >
>> >  1.  We shall not describe the configuration with its implementation for
>> > > 'execution.checkpointing.local-copy.*' options, for hashmap
>> > state-backend,
>> > > it would write two streams and for Rocksdb state-backend, it would use
>> > > hard-link for backup. Thus, I think
>> > > 'execution.checkpointing.local-backup.*' looks better.
>> >
>> > I agreed that we'd better name the option in user's perspective instead
>> of
>> > the implementation, thus I name it as a copy of the checkpoint in the
>> > local disk, regardless of the way of generating it. The word 'backup' is
>> > also suitable for this case, so I agree to change to
>> > 'execution.checkpointing.local-backup.*' if no one objects.
>> >
>> >  2.  What does the 'execution.checkpointing.data-inline-threshold'
>> mean? It
>> > > seems not so easy to understand.
>> >
>> > The 'execution.checkpointing.data-inline-threshold' (original one as
>> > 'state.storage.fs.memory-threshold') stands for the size threshold below
>> > which state chunks will store inline with the metadata, thus I call it
>> > 'data-inline-threshold'.
>> >
>> >
>> > Best,
>> > Zakelly
>> >
>> > On Mon, Jan 8, 2024 at 10:09 AM Yun Tang <myas...@live.com> wrote:
>> >
>> > > Hi Zakelly,
>> > >
>> > > Thanks for driving this topic. I have two concerns here:
>> > >
>> > >   1.  We shall not describe the configuration with its implementation
>> for
>> > > 'execution.checkpointing.local-copy.*' options, for hashmap
>> > state-backend,
>> > > it would write two streams and for Rocksdb state-backend, it would use
>> > > hard-link for backup. Thus, I think
>> > > 'execution.checkpointing.local-backup.*' looks better.
>> > >   2.  What does the 'execution.checkpointing.data-inline-threshold'
>> mean?
>> > > It seems not so easy to understand.
>> > >
>> > > Best
>> > > Yun Tang
>> > > ________________________________
>> > > From: Piotr Nowojski <pnowoj...@apache.org>
>> > > Sent: Thursday, January 4, 2024 22:37
>> > > To: dev@flink.apache.org <dev@flink.apache.org>
>> > > Subject: Re: [DISCUSS] FLIP-406: Reorganize State & Checkpointing &
>> > > Recovery Configuration
>> > >
>> > > Hi,
>> > >
>> > > Thanks for trying to clean this up! I don't have strong opinions on
>> the
>> > > topics discussed here, so generally speaking +1 from my side!
>> > >
>> > > Best,
>> > > Piotrek
>> > >
>> > > śr., 3 sty 2024 o 04:16 Rui Fan <1996fan...@gmail.com> napisał(a):
>> > >
>> > > > Thanks for the feedback!
>> > > >
>> > > > Using the `execution.checkpointing.incremental.enabled`,
>> > > > and enabling it by default sounds good to me.
>> > > >
>> > > > Best,
>> > > > Rui
>> > > >
>> > > > On Wed, Jan 3, 2024 at 11:10 AM Zakelly Lan <zakelly....@gmail.com>
>> > > wrote:
>> > > >
>> > > > > Hi Rui,
>> > > > >
>> > > > > Thanks for your comments!
>> > > > >
>> > > > > IMO, given that the state backend can be plugably loaded (as you
>> can
>> > > > > specify a state backend factory), I prefer not providing state
>> > backend
>> > > > > specified options in the framework.
>> > > > >
>> > > > > Secondly, the incremental checkpoint is actually a sharing file
>> > > strategy
>> > > > > across checkpoints, which means the state backend *could* reuse
>> files
>> > > > from
>> > > > > previous cp but not *must* do so. When the state backend could not
>> > > reuse
>> > > > > the files, it is reasonable to fallback to a full checkpoint.
>> > > > >
>> > > > > Thus, I suggest we make it `execution.checkpointing.incremental`
>> and
>> > > > enable
>> > > > > it by default. For those state backends not supporting this, they
>> > > perform
>> > > > > full checkpoints and print a warning to inform users. Users do not
>> > need
>> > > > to
>> > > > > pay special attention to different options to control this across
>> > > > different
>> > > > > state backends. This is more user-friendly in my opinion. WDYT?
>> > > > >
>> > > > > On Tue, Jan 2, 2024 at 10:49 AM Rui Fan <1996fan...@gmail.com>
>> > wrote:
>> > > > >
>> > > > > > Hi Zakelly,
>> > > > > >
>> > > > > > I'm not sure whether we could add the state backend type in the
>> > > > > > new key name of state.backend.incremental. It means we use
>> > > > > > `execution.checkpointing.rocksdb-incremental` or
>> > > > > > `execution.checkpointing.rocksdb-incremental.enabled`.
>> > > > > >
>> > > > > > So far, state.backend.incremental only works for rocksdb state
>> > > backend.
>> > > > > > And this feature or optimization is very valuable and huge for
>> > large
>> > > > > > state flink jobs. I believe it's enabled for most production
>> flink
>> > > jobs
>> > > > > > with large rocksdb state.
>> > > > > >
>> > > > > > If this option isn't generic for all state backend types, I
>> guess
>> > we
>> > > > > > can enable `execution.checkpointing.rocksdb-incremental.enabled`
>> > > > > > by default in Flink 2.0.
>> > > > > >
>> > > > > > But if it works for all state backends, it's hard to enable it
>> by
>> > > > > default.
>> > > > > > Enabling great and valuable features or improvements are useful
>> > > > > > for users, especially a lot of new flink users. Out-of-the-box
>> > > options
>> > > > > > are good for users.
>> > > > > >
>> > > > > > WDYT?
>> > > > > >
>> > > > > > Best,
>> > > > > > Rui
>> > > > > >
>> > > > > > On Fri, Dec 29, 2023 at 1:45 PM Zakelly Lan <
>> zakelly....@gmail.com
>> > >
>> > > > > wrote:
>> > > > > >
>> > > > > > > Hi everyone,
>> > > > > > >
>> > > > > > > Thanks all for your comments!
>> > > > > > >
>> > > > > > > As many of you have questions about the names for boolean
>> > options,
>> > > I
>> > > > > > > suggest we make a naming rule for them. For now I could think
>> of
>> > > > three
>> > > > > > > options:
>> > > > > > >
>> > > > > > > Option 1: Use enumeration options if possible. But this may
>> cause
>> > > > some
>> > > > > > name
>> > > > > > > collisions or confusion as we discussed and we should unify
>> the
>> > > > > statement
>> > > > > > > everywhere.
>> > > > > > > Option 2: Use boolean options and add 'enabled' as the suffix.
>> > > > > > > Option 3: Use boolean options and ONLY add 'enabled' when
>> there
>> > are
>> > > > > more
>> > > > > > > detailed configurations under the same prefix, to prevent one
>> > name
>> > > > from
>> > > > > > > serving as a prefix to another.
>> > > > > > >
>> > > > > > > I am slightly inclined to Option 3, since it is more in line
>> with
>> > > > > current
>> > > > > > > practice and friendly for existing users. Also It reduces the
>> > > length
>> > > > of
>> > > > > > > configuration names as much as possible. I really want to hear
>> > your
>> > > > > > > opinions.
>> > > > > > >
>> > > > > > >
>> > > > > > > @Xuannan
>> > > > > > >
>> > > > > > > I agree with your comments 1 and 3.
>> > > > > > >
>> > > > > > > For 2, If we decide to change the name, maybe
>> > > > > > > `execution.checkpointing.parallel-cleaner` is better? And as
>> for
>> > > > > whether
>> > > > > > to
>> > > > > > > add 'enabled' I suggest we discuss the rule above. WDYT?
>> > > > > > > Thanks!
>> > > > > > >
>> > > > > > >
>> > > > > > > Best,
>> > > > > > > Zakelly
>> > > > > > >
>> > > > > > > On Fri, Dec 29, 2023 at 12:02 PM Xuannan Su <
>> > suxuanna...@gmail.com
>> > > >
>> > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hi Zakelly,
>> > > > > > > >
>> > > > > > > > Thanks for driving this! The organization of the
>> configuration
>> > > > option
>> > > > > > > > in the FLIP looks much cleaner and easier to understand. +1
>> to
>> > > the
>> > > > > > > > FLIP.
>> > > > > > > >
>> > > > > > > > Just some questions from me.
>> > > > > > > >
>> > > > > > > > 1. I think the change to the ConfigOptions should be put in
>> the
>> > > > > > > > `Public Interface` section, instead of `Proposed Changed`,
>> as
>> > > those
>> > > > > > > > configuration options are public interface.
>> > > > > > > >
>> > > > > > > > 2. The key `state.checkpoint.cleaner.parallel-mode` seems
>> > > > confusing.
>> > > > > > > > It feels like it is used to choose different modes. In
>> fact, it
>> > > is
>> > > > a
>> > > > > > > > boolean flag to indicate whether to enable parallel clean.
>> How
>> > > > about
>> > > > > > > > making it `state.checkpoint.cleaner.parallel-mode.enabled`?
>> > > > > > > >
>> > > > > > > > 3. The `execution.checkpointing.write-buffer` may better be
>> > > > > > > > `execution.checkpointing.write-buffer-size` so that we know
>> it
>> > is
>> > > > > > > > configuring the size of the buffer.
>> > > > > > > >
>> > > > > > > > Best,
>> > > > > > > > Xuannan
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Wed, Dec 27, 2023 at 7:17 PM Yanfei Lei <
>> > fredia...@gmail.com>
>> > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > Hi Zakelly,
>> > > > > > > > >
>> > > > > > > > > > Considering the name occupation, how about naming it as
>> > > > > > > > `execution.checkpointing.type`?
>> > > > > > > > >
>> > > > > > > > > `Checkpoint Type`[1,2] is used to describe
>> aligned/unaligned
>> > > > > > > > > checkpoint, I am inclined to make a choice between
>> > > > > > > > > `execution.checkpointing.incremental` and
>> > > > > > > > > `execution.checkpointing.incremental.enabled`.
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > [1]
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/monitoring/checkpoint_monitoring/
>> > > > > > > > > [2]
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/flink/blob/master/flink-runtime-web/web-dashboard/src/app/pages/job/checkpoints/detail/job-checkpoints-detail.component.html#L27
>> > > > > > > > >
>> > > > > > > > > --
>> > > > > > > > > Best,
>> > > > > > > > > Yanfei
>> > > > > > > > >
>> > > > > > > > > Zakelly Lan <zakelly....@gmail.com> 于2023年12月27日周三
>> 14:41写道:
>> > > > > > > > > >
>> > > > > > > > > > Hi Lijie,
>> > > > > > > > > >
>> > > > > > > > > > Thanks for the reminder! I missed this.
>> > > > > > > > > >
>> > > > > > > > > > Considering the name occupation, how about naming it as
>> > > > > > > > > > `execution.checkpointing.type`?
>> > > > > > > > > >
>> > > > > > > > > > Actually I think the current
>> `execution.checkpointing.mode`
>> > > is
>> > > > > > > > confusing in
>> > > > > > > > > > some ways, maybe
>> `execution.checkpointing.data-consistency`
>> > > is
>> > > > > > > better.
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > Best,
>> > > > > > > > > > Zakelly
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > On Wed, Dec 27, 2023 at 12:59 PM Lijie Wang <
>> > > > > > > wangdachui9...@gmail.com>
>> > > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > Hi Zakelly,
>> > > > > > > > > > >
>> > > > > > > > > > > >> I'm wondering if
>> > `execution.checkpointing.savepoint-dir`
>> > > > > would
>> > > > > > > be
>> > > > > > > > > > > better.
>> > > > > > > > > > >
>> > > > > > > > > > > `execution.checkpointing.dir` and
>> > > > > > > > `execution.checkpointing.savepoint-dir`
>> > > > > > > > > > > are also fine for me.
>> > > > > > > > > > >
>> > > > > > > > > > > >> So I think an enumeration option
>> > > > > > `execution.checkpointing.mode`
>> > > > > > > > which
>> > > > > > > > > > > can be 'full' (default) or 'incremental' would be
>> better
>> > > > > > > > > > >
>> > > > > > > > > > > I agree with using an enumeration option. But
>> currently
>> > > there
>> > > > > is
>> > > > > > > > already a
>> > > > > > > > > > > configuration option called
>> > `execution.checkpointing.mode`,
>> > > > > which
>> > > > > > > is
>> > > > > > > > used
>> > > > > > > > > > > to choose EXACTLY_ONCE or AT_LEAST_ONCE. Maybe we
>> need to
>> > > use
>> > > > > > > > another name
>> > > > > > > > > > > or merge these two options.
>> > > > > > > > > > >
>> > > > > > > > > > > Best,
>> > > > > > > > > > > Lijie
>> > > > > > > > > > >
>> > > > > > > > > > > Zakelly Lan <zakelly....@gmail.com> 于2023年12月27日周三
>> > > 11:43写道:
>> > > > > > > > > > >
>> > > > > > > > > > > > Hi everyone,
>> > > > > > > > > > > >
>> > > > > > > > > > > > Thanks all for your comments!
>> > > > > > > > > > > >
>> > > > > > > > > > > > @Yanfei
>> > > > > > > > > > > >
>> > > > > > > > > > > > > 1. For some state backends that do not support
>> > > > incremental
>> > > > > > > > checkpoint,
>> > > > > > > > > > > > > how does the
>> > execution.checkpointing.incrementaloption
>> > > > take
>> > > > > > > > effect? Or
>> > > > > > > > > > > > > is it better to put incremental under
>> > > > > > > > state.backend.xxx.incremental?
>> > > > > > > > > > > > >
>> > > > > > > > > > > > I'd rather not put the option for incremental
>> > checkpoint
>> > > > > under
>> > > > > > > the
>> > > > > > > > > > > > 'state.backend', since it is more about the
>> > checkpointing
>> > > > > > instead
>> > > > > > > > of
>> > > > > > > > > > > state
>> > > > > > > > > > > > accessing. Of course, the state backend may not
>> > > necessarily
>> > > > > do
>> > > > > > > > > > > incremental
>> > > > > > > > > > > > checkpoint as requested. If the state backend is not
>> > > > capable
>> > > > > of
>> > > > > > > > taking
>> > > > > > > > > > > > incremental cp, it is better to fallback to the full
>> > cp.
>> > > > > > > > > > > >
>> > > > > > > > > > > > 2. I'm a little worried that putting all
>> configurations
>> > > > into
>> > > > > > > > > > > > > `ExecutionCheckpointingOptions` will introduce
>> some
>> > > > > > dependency
>> > > > > > > > > > > > > problems. Some options would be used by
>> flink-runtime
>> > > > > module,
>> > > > > > > but
>> > > > > > > > > > > > > flink-runtime should not depend on
>> > > flink-streaming-java.
>> > > > > e.g.
>> > > > > > > > > > > > > FLINK-28286[1].
>> > > > > > > > > > > > > So, I prefer to move configurations to
>> > > > > > `CheckpointingOptions`,
>> > > > > > > > WDYT?
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > Yes, that's a very good point.  Moving to
>> > > > > > > > > > > > `CheckpointingOptions`(flink-core) makes sense.
>> > > > > > > > > > > >
>> > > > > > > > > > > > @Lijie
>> > > > > > > > > > > >
>> > > > > > > > > > > > How about
>> > > > > > > > > > > > > state.savepoints.dir ->
>> > > > > execution.checkpointing.savepoint.dir
>> > > > > > > > > > > > > state.checkpoints.dir ->
>> > > > > > execution.checkpointing.checkpoint.dir
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > Actually, I think the `checkpointing.checkpoint` may
>> > > cause
>> > > > > some
>> > > > > > > > > > > confusion.
>> > > > > > > > > > > > But I'm ok if others agree.
>> > > > > > > > > > > > I'm wondering if
>> > `execution.checkpointing.savepoint-dir`
>> > > > > would
>> > > > > > be
>> > > > > > > > better.
>> > > > > > > > > > > > WDYT?
>> > > > > > > > > > > >
>> > > > > > > > > > > > 2. We changed the
>> execution.checkpointing.local-copy'
>> > to
>> > > > > > > > > > > > > 'execution.checkpointing.local-copy.enabled'.
>> Should
>> > we
>> > > > > also
>> > > > > > > add
>> > > > > > > > > > > > "enabled"
>> > > > > > > > > > > > > suffix for other boolean type configuration
>> options ?
>> > > For
>> > > > > > > > example,
>> > > > > > > > > > > > > execution.checkpointing.incremental ->
>> > > > > > > > > > > > > execution.checkpointing.incremental.enabled
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > Actually, the incremental cp is something like
>> > choosing a
>> > > > > mode
>> > > > > > > for
>> > > > > > > > doing
>> > > > > > > > > > > > checkpoint instead of enabling a function. So I
>> think
>> > an
>> > > > > > > > enumeration
>> > > > > > > > > > > option
>> > > > > > > > > > > > `execution.checkpointing.mode` which can be 'full'
>> > > > (default)
>> > > > > or
>> > > > > > > > > > > > 'incremental' would be better, WDYT?
>> > > > > > > > > > > > And @Rui Fan @Yanfei What do you think about this?
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > On Tue, Dec 26, 2023 at 5:15 PM Lijie Wang <
>> > > > > > > > wangdachui9...@gmail.com>
>> > > > > > > > > > > > wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > > > Hi Zakelly,
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Thanks for driving the discussion.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > 1.
>> > > > > > > > > > > > > >> But I'm not so sure since there is only one
>> > > > > > > savepoint-related
>> > > > > > > > > > > option.
>> > > > > > > > > > > > > Maybe someone else could share some thoughts here.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > How about
>> > > > > > > > > > > > > state.savepoints.dir ->
>> > > > > execution.checkpointing.savepoint.dir
>> > > > > > > > > > > > > state.checkpoints.dir ->
>> > > > > > execution.checkpointing.checkpoint.dir
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > 2. We changed the
>> execution.checkpointing.local-copy'
>> > > to
>> > > > > > > > > > > > > 'execution.checkpointing.local-copy.enabled'.
>> Should
>> > we
>> > > > > also
>> > > > > > > add
>> > > > > > > > > > > > "enabled"
>> > > > > > > > > > > > > suffix for other boolean type configuration
>> options ?
>> > > For
>> > > > > > > > example,
>> > > > > > > > > > > > > execution.checkpointing.incremental ->
>> > > > > > > > > > > > > execution.checkpointing.incremental.enabled
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > In this way, the naming style of configuration
>> > options
>> > > is
>> > > > > > > > unified, and
>> > > > > > > > > > > it
>> > > > > > > > > > > > > can avoid potential similar problems (for
>> example, we
>> > > may
>> > > > > > need
>> > > > > > > > to add
>> > > > > > > > > > > > more
>> > > > > > > > > > > > > options for incremental checkpoint in the future).
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Best,
>> > > > > > > > > > > > > Lijie
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Yanfei Lei <fredia...@gmail.com> 于2023年12月26日周二
>> > > 12:05写道:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > > Hi Zakelly,
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Thank you for creating the FLIP and starting the
>> > > > > > discussion.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > The current arrangement of these options is
>> indeed
>> > > > > somewhat
>> > > > > > > > > > > haphazard,
>> > > > > > > > > > > > > > and the new arrangement looks much better. I
>> have
>> > > some
>> > > > > > > > questions
>> > > > > > > > > > > about
>> > > > > > > > > > > > > > the arrangement of some new configuration
>> options:
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > 1. For some state backends that do not support
>> > > > > incremental
>> > > > > > > > > > > checkpoint,
>> > > > > > > > > > > > > > how does the
>> > > execution.checkpointing.incrementaloption
>> > > > > take
>> > > > > > > > effect?
>> > > > > > > > > > > Or
>> > > > > > > > > > > > > > is it better to put incremental under
>> > > > > > > > state.backend.xxx.incremental?
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > 2. I'm a little worried that putting all
>> > > configurations
>> > > > > > into
>> > > > > > > > > > > > > > `ExecutionCheckpointingOptions` will introduce
>> some
>> > > > > > > dependency
>> > > > > > > > > > > > > > problems. Some options would be used by
>> > flink-runtime
>> > > > > > module,
>> > > > > > > > but
>> > > > > > > > > > > > > > flink-runtime should not depend on
>> > > > flink-streaming-java.
>> > > > > > e.g.
>> > > > > > > > > > > > > > FLINK-28286[1].
>> > > > > > > > > > > > > > So, I prefer to move configurations to
>> > > > > > > `CheckpointingOptions`,
>> > > > > > > > WDYT?
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > [1]
>> > > https://issues.apache.org/jira/browse/FLINK-28286
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > --
>> > > > > > > > > > > > > > Best,
>> > > > > > > > > > > > > > Yanfei
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Zakelly Lan <zakelly....@gmail.com>
>> 于2023年12月25日周一
>> > > > > > 21:14写道:
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Hi Rui Fan and Junrui,
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Thanks for the reminder! I agree to change the
>> > > > > > > > > > > > > > > 'execution.checkpointing.local-copy' to
>> > > > > > > > > > > > > > > 'execution.checkpointing.local-copy.enabled'.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > And for other suggestions Rui proposed:
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 1. How about
>> execution.checkpointing.storage.type
>> > > > > instead
>> > > > > > > > > > > > > > > > of execution.checkpointing.storage?
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Ah, I missed something here. Actually I
>> suggest
>> > we
>> > > > > could
>> > > > > > > > merge the
>> > > > > > > > > > > > > > current
>> > > > > > > > > > > > > > > 'state.checkpoints.dir' and
>> > > > 'state.checkpoint-storage'
>> > > > > > into
>> > > > > > > > one URI
>> > > > > > > > > > > > > > > configuration named
>> > 'execution.checkpointing.dir'.
>> > > > > WDYT?
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 3. execution.checkpointing.savepoint.dir is a
>> > > little
>> > > > > > weird.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Yes, I think it is better to make 'savepoint'
>> and
>> > > > > > > > 'checkpoint' the
>> > > > > > > > > > > > same
>> > > > > > > > > > > > > > > level. But I'm not so sure since there is only
>> > one
>> > > > > > > > > > > savepoint-related
>> > > > > > > > > > > > > > > option. Maybe someone else could share some
>> > > thoughts
>> > > > > > here.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > 4. How about execution.recovery.claim-mode
>> > instead
>> > > of
>> > > > > > > > > > > > > > > > execution.recovery.mode?
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >  Agreed. That's more accurate.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Many thanks for your suggestions!
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Best,
>> > > > > > > > > > > > > > > Zakelly
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > On Mon, Dec 25, 2023 at 8:18 PM Junrui Lee <
>> > > > > > > > jrlee....@gmail.com>
>> > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Hi Zakelly,
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Thanks for driving this. I agree that the
>> > > proposed
>> > > > > > > > restructuring
>> > > > > > > > > > > of
>> > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > configuration options is largely positive.
>> It
>> > > will
>> > > > > make
>> > > > > > > > > > > > understanding
>> > > > > > > > > > > > > > and
>> > > > > > > > > > > > > > > > working with Flink configurations more
>> > intuitive.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Most of the proposed changes look great.
>> Just a
>> > > > > > heads-up,
>> > > > > > > > as Rui
>> > > > > > > > > > > > Fan
>> > > > > > > > > > > > > > > > mentioned, Flink currently requires that no
>> > > > > > > configOption's
>> > > > > > > > key be
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > prefix of another to avoid issues when we
>> > > > eventually
>> > > > > > > adopt
>> > > > > > > > a
>> > > > > > > > > > > > standard
>> > > > > > > > > > > > > > YAML
>> > > > > > > > > > > > > > > > parser, as detailed in FLINK-29372 (
>> > > > > > > > > > > > > > > >
>> > > https://issues.apache.org/jira/browse/FLINK-29372
>> > > > ).
>> > > > > > > > Therefore,
>> > > > > > > > > > > > it's
>> > > > > > > > > > > > > > better
>> > > > > > > > > > > > > > > > to change the key
>> > > > > 'execution.checkpointing.local-copy'
>> > > > > > > > because it
>> > > > > > > > > > > > > > serves as
>> > > > > > > > > > > > > > > > a prefix to the key
>> > > > > > > > 'execution.checkpointing.local-copy.dir'.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Best regards,
>> > > > > > > > > > > > > > > > Junrui
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Rui Fan <1996fan...@gmail.com>
>> 于2023年12月25日周一
>> > > > > 19:11写道:
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Hi Zakelly,
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Thank you for driving this proposal!
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Overall good for me. I have some questions
>> > > about
>> > > > > > these
>> > > > > > > > names.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > 1. How about
>> > > execution.checkpointing.storage.type
>> > > > > > > > instead of
>> > > > > > > > > > > > > > > > > execution.checkpointing.storage?
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > It's similar to state.backend.type.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > 2. How about
>> > > > > > execution.checkpointing.local-copy.enabled
>> > > > > > > > instead
>> > > > > > > > > > > > of
>> > > > > > > > > > > > > > > > > execution.checkpointing.local-copy?
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > You added a new option:
>> > > > > > > > execution.checkpointing.local-copy.dir.
>> > > > > > > > > > > > > > > > > IIUC, one option name shouldn't be the
>> prefix
>> > > of
>> > > > > > other
>> > > > > > > > options.
>> > > > > > > > > > > > > > > > > If you add a new option
>> > > > > > > > execution.checkpointing.local-copy,
>> > > > > > > > > > > > > > > > > flink CI will fail directly.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > 3. execution.checkpointing.savepoint.dir
>> is a
>> > > > > little
>> > > > > > > > weird.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > For old options: state.savepoints.dir and
>> > > > > > > > > > > state.checkpoints.dir,
>> > > > > > > > > > > > > > > > > the savepoint and checkpoint are the same
>> > > level.
>> > > > It
>> > > > > > > means
>> > > > > > > > > > > > > > > > > it's a checkpoint or savepoint.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > The new option
>> execution.checkpointing.dir is
>> > > > fine
>> > > > > > for
>> > > > > > > > me.
>> > > > > > > > > > > > > > > > > However,
>> > execution.checkpointing.savepoint.dir
>> > > > is a
>> > > > > > > > little
>> > > > > > > > > > > weird.
>> > > > > > > > > > > > > > > > > I don't know which name is better now.
>> Let us
>> > > > think
>> > > > > > > > about it
>> > > > > > > > > > > > more.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > 4. How about execution.recovery.claim-mode
>> > > > instead
>> > > > > of
>> > > > > > > > > > > > > > > > > execution.recovery.mode?
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > The meaning of mode is too broad. The
>> > > claim-mode
>> > > > > may
>> > > > > > > > > > > > > > > > > be more accurate for users.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > WDYT?
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Best,
>> > > > > > > > > > > > > > > > > Rui
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > On Mon, Dec 25, 2023 at 5:14 PM Zakelly
>> Lan <
>> > > > > > > > > > > > zakelly....@gmail.com
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > Hi devs,
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > I'd like to start a discussion on
>> FLIP-406:
>> > > > > > > Reorganize
>> > > > > > > > State
>> > > > > > > > > > > &
>> > > > > > > > > > > > > > > > > > Checkpointing & Recovery
>> Configuration[1].
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > Currently, the configuration options
>> > > pertaining
>> > > > > to
>> > > > > > > > > > > > checkpointing,
>> > > > > > > > > > > > > > > > > recovery,
>> > > > > > > > > > > > > > > > > > and state management are primarily
>> grouped
>> > > > under
>> > > > > > the
>> > > > > > > > > > > following
>> > > > > > > > > > > > > > > > prefixes:
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >    - state.backend.* : configurations
>> > related
>> > > > to
>> > > > > > > state
>> > > > > > > > > > > > accessing
>> > > > > > > > > > > > > > and
>> > > > > > > > > > > > > > > > > >    checkpointing, as well as specific
>> > options
>> > > > for
>> > > > > > > > individual
>> > > > > > > > > > > > > state
>> > > > > > > > > > > > > > > > > backends
>> > > > > > > > > > > > > > > > > >    - execution.checkpointing.* :
>> > > configurations
>> > > > > > > > associated
>> > > > > > > > > > > with
>> > > > > > > > > > > > > > > > > checkpoint
>> > > > > > > > > > > > > > > > > >    execution and recovery
>> > > > > > > > > > > > > > > > > >    - execution.savepoint.*:
>> configurations
>> > > for
>> > > > > > > > recovery from
>> > > > > > > > > > > > > > savepoint
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > In addition, there are several
>> individual
>> > > > options
>> > > > > > > such
>> > > > > > > > as '
>> > > > > > > > > > > > > > > > > > *state.checkpoint-storage*' and
>> > > > > > > > '*state.checkpoints.dir*'
>> > > > > > > > > > > that
>> > > > > > > > > > > > > fall
>> > > > > > > > > > > > > > > > > outside
>> > > > > > > > > > > > > > > > > > of these prefixes. The current
>> arrangement
>> > of
>> > > > > these
>> > > > > > > > options,
>> > > > > > > > > > > > > which
>> > > > > > > > > > > > > > span
>> > > > > > > > > > > > > > > > > > multiple modules, is somewhat haphazard
>> and
>> > > > > lacks a
>> > > > > > > > > > > systematic
>> > > > > > > > > > > > > > > > structure.
>> > > > > > > > > > > > > > > > > > For example, the options under the
>> > > > > > > > '*CheckpointingOptions*'
>> > > > > > > > > > > > and '
>> > > > > > > > > > > > > > > > > > *ExecutionCheckpointingOptions*' are
>> > related
>> > > > and
>> > > > > > have
>> > > > > > > > no
>> > > > > > > > > > > clear
>> > > > > > > > > > > > > > > > boundaries
>> > > > > > > > > > > > > > > > > > from the user's perspective, but there
>> is
>> > no
>> > > > > > unified
>> > > > > > > > prefix
>> > > > > > > > > > > for
>> > > > > > > > > > > > > > them.
>> > > > > > > > > > > > > > > > > With
>> > > > > > > > > > > > > > > > > > the upcoming release of Flink 2.0, we
>> have
>> > an
>> > > > > > > excellent
>> > > > > > > > > > > > > > opportunity to
>> > > > > > > > > > > > > > > > > > overhaul and restructure the
>> configurations
>> > > > > related
>> > > > > > > to
>> > > > > > > > > > > > > > checkpointing,
>> > > > > > > > > > > > > > > > > > recovery, and state management. This
>> FLIP
>> > > > > proposes
>> > > > > > to
>> > > > > > > > > > > > reorganize
>> > > > > > > > > > > > > > these
>> > > > > > > > > > > > > > > > > > settings, making it more coherent by
>> > module,
>> > > > > which
>> > > > > > > > would
>> > > > > > > > > > > > > > significantly
>> > > > > > > > > > > > > > > > > > lower the barriers for understanding and
>> > > reduce
>> > > > > the
>> > > > > > > > > > > development
>> > > > > > > > > > > > > > costs
>> > > > > > > > > > > > > > > > > > moving forward.
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > Looking forward to hearing from you!
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > [1]
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789560
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > Best,
>> > > > > > > > > > > > > > > > > > Zakelly
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>>
>> --
>> Best,
>> Hangxiang.
>>
>

Reply via email to