Hi Zakelly,

I'm not sure whether we could add the state backend type in the
new key name of state.backend.incremental. It means we use
`execution.checkpointing.rocksdb-incremental` or
`execution.checkpointing.rocksdb-incremental.enabled`.

So far, state.backend.incremental only works for rocksdb state backend.
And this feature or optimization is very valuable and huge for large
state flink jobs. I believe it's enabled for most production flink jobs
with large rocksdb state.

If this option isn't generic for all state backend types, I guess we
can enable `execution.checkpointing.rocksdb-incremental.enabled`
by default in Flink 2.0.

But if it works for all state backends, it's hard to enable it by default.
Enabling great and valuable features or improvements are useful
for users, especially a lot of new flink users. Out-of-the-box options
are good for users.

WDYT?

Best,
Rui

On Fri, Dec 29, 2023 at 1:45 PM Zakelly Lan <zakelly....@gmail.com> wrote:

> Hi everyone,
>
> Thanks all for your comments!
>
> As many of you have questions about the names for boolean options, I
> suggest we make a naming rule for them. For now I could think of three
> options:
>
> Option 1: Use enumeration options if possible. But this may cause some name
> collisions or confusion as we discussed and we should unify the statement
> everywhere.
> Option 2: Use boolean options and add 'enabled' as the suffix.
> Option 3: Use boolean options and ONLY add 'enabled' when there are more
> detailed configurations under the same prefix, to prevent one name from
> serving as a prefix to another.
>
> I am slightly inclined to Option 3, since it is more in line with current
> practice and friendly for existing users. Also It reduces the length of
> configuration names as much as possible. I really want to hear your
> opinions.
>
>
> @Xuannan
>
> I agree with your comments 1 and 3.
>
> For 2, If we decide to change the name, maybe
> `execution.checkpointing.parallel-cleaner` is better? And as for whether to
> add 'enabled' I suggest we discuss the rule above. WDYT?
> Thanks!
>
>
> Best,
> Zakelly
>
> On Fri, Dec 29, 2023 at 12:02 PM Xuannan Su <suxuanna...@gmail.com> wrote:
>
> > Hi Zakelly,
> >
> > Thanks for driving this! The organization of the configuration option
> > in the FLIP looks much cleaner and easier to understand. +1 to the
> > FLIP.
> >
> > Just some questions from me.
> >
> > 1. I think the change to the ConfigOptions should be put in the
> > `Public Interface` section, instead of `Proposed Changed`, as those
> > configuration options are public interface.
> >
> > 2. The key `state.checkpoint.cleaner.parallel-mode` seems confusing.
> > It feels like it is used to choose different modes. In fact, it is a
> > boolean flag to indicate whether to enable parallel clean. How about
> > making it `state.checkpoint.cleaner.parallel-mode.enabled`?
> >
> > 3. The `execution.checkpointing.write-buffer` may better be
> > `execution.checkpointing.write-buffer-size` so that we know it is
> > configuring the size of the buffer.
> >
> > Best,
> > Xuannan
> >
> >
> > On Wed, Dec 27, 2023 at 7:17 PM Yanfei Lei <fredia...@gmail.com> wrote:
> > >
> > > Hi Zakelly,
> > >
> > > > Considering the name occupation, how about naming it as
> > `execution.checkpointing.type`?
> > >
> > > `Checkpoint Type`[1,2] is used to describe aligned/unaligned
> > > checkpoint, I am inclined to make a choice between
> > > `execution.checkpointing.incremental` and
> > > `execution.checkpointing.incremental.enabled`.
> > >
> > >
> > > [1]
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/monitoring/checkpoint_monitoring/
> > > [2]
> >
> https://github.com/apache/flink/blob/master/flink-runtime-web/web-dashboard/src/app/pages/job/checkpoints/detail/job-checkpoints-detail.component.html#L27
> > >
> > > --
> > > Best,
> > > Yanfei
> > >
> > > Zakelly Lan <zakelly....@gmail.com> 于2023年12月27日周三 14:41写道:
> > > >
> > > > Hi Lijie,
> > > >
> > > > Thanks for the reminder! I missed this.
> > > >
> > > > Considering the name occupation, how about naming it as
> > > > `execution.checkpointing.type`?
> > > >
> > > > Actually I think the current `execution.checkpointing.mode` is
> > confusing in
> > > > some ways, maybe `execution.checkpointing.data-consistency` is
> better.
> > > >
> > > >
> > > > Best,
> > > > Zakelly
> > > >
> > > >
> > > > On Wed, Dec 27, 2023 at 12:59 PM Lijie Wang <
> wangdachui9...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Zakelly,
> > > > >
> > > > > >> I'm wondering if `execution.checkpointing.savepoint-dir` would
> be
> > > > > better.
> > > > >
> > > > > `execution.checkpointing.dir` and
> > `execution.checkpointing.savepoint-dir`
> > > > > are also fine for me.
> > > > >
> > > > > >> So I think an enumeration option `execution.checkpointing.mode`
> > which
> > > > > can be 'full' (default) or 'incremental' would be better
> > > > >
> > > > > I agree with using an enumeration option. But currently there is
> > already a
> > > > > configuration option called `execution.checkpointing.mode`, which
> is
> > used
> > > > > to choose EXACTLY_ONCE or AT_LEAST_ONCE. Maybe we need to use
> > another name
> > > > > or merge these two options.
> > > > >
> > > > > Best,
> > > > > Lijie
> > > > >
> > > > > Zakelly Lan <zakelly....@gmail.com> 于2023年12月27日周三 11:43写道:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > Thanks all for your comments!
> > > > > >
> > > > > > @Yanfei
> > > > > >
> > > > > > > 1. For some state backends that do not support incremental
> > checkpoint,
> > > > > > > how does the execution.checkpointing.incrementaloption take
> > effect? Or
> > > > > > > is it better to put incremental under
> > state.backend.xxx.incremental?
> > > > > > >
> > > > > > I'd rather not put the option for incremental checkpoint under
> the
> > > > > > 'state.backend', since it is more about the checkpointing instead
> > of
> > > > > state
> > > > > > accessing. Of course, the state backend may not necessarily do
> > > > > incremental
> > > > > > checkpoint as requested. If the state backend is not capable of
> > taking
> > > > > > incremental cp, it is better to fallback to the full cp.
> > > > > >
> > > > > > 2. I'm a little worried that putting all configurations into
> > > > > > > `ExecutionCheckpointingOptions` will introduce some dependency
> > > > > > > problems. Some options would be used by flink-runtime module,
> but
> > > > > > > flink-runtime should not depend on flink-streaming-java. e.g.
> > > > > > > FLINK-28286[1].
> > > > > > > So, I prefer to move configurations to `CheckpointingOptions`,
> > WDYT?
> > > > > > >
> > > > > >
> > > > > > Yes, that's a very good point.  Moving to
> > > > > > `CheckpointingOptions`(flink-core) makes sense.
> > > > > >
> > > > > > @Lijie
> > > > > >
> > > > > > How about
> > > > > > > state.savepoints.dir -> execution.checkpointing.savepoint.dir
> > > > > > > state.checkpoints.dir -> execution.checkpointing.checkpoint.dir
> > > > > >
> > > > > >
> > > > > > Actually, I think the `checkpointing.checkpoint` may cause some
> > > > > confusion.
> > > > > > But I'm ok if others agree.
> > > > > > I'm wondering if `execution.checkpointing.savepoint-dir` would be
> > better.
> > > > > > WDYT?
> > > > > >
> > > > > > 2. We changed the execution.checkpointing.local-copy' to
> > > > > > > 'execution.checkpointing.local-copy.enabled'. Should we also
> add
> > > > > > "enabled"
> > > > > > > suffix for other boolean type configuration options ? For
> > example,
> > > > > > > execution.checkpointing.incremental ->
> > > > > > > execution.checkpointing.incremental.enabled
> > > > > > >
> > > > > >
> > > > > > Actually, the incremental cp is something like choosing a mode
> for
> > doing
> > > > > > checkpoint instead of enabling a function. So I think an
> > enumeration
> > > > > option
> > > > > > `execution.checkpointing.mode` which can be 'full' (default) or
> > > > > > 'incremental' would be better, WDYT?
> > > > > > And @Rui Fan @Yanfei What do you think about this?
> > > > > >
> > > > > >
> > > > > > On Tue, Dec 26, 2023 at 5:15 PM Lijie Wang <
> > wangdachui9...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Zakelly,
> > > > > > >
> > > > > > > Thanks for driving the discussion.
> > > > > > >
> > > > > > > 1.
> > > > > > > >> But I'm not so sure since there is only one
> savepoint-related
> > > > > option.
> > > > > > > Maybe someone else could share some thoughts here.
> > > > > > >
> > > > > > > How about
> > > > > > > state.savepoints.dir -> execution.checkpointing.savepoint.dir
> > > > > > > state.checkpoints.dir -> execution.checkpointing.checkpoint.dir
> > > > > > >
> > > > > > > 2. We changed the execution.checkpointing.local-copy' to
> > > > > > > 'execution.checkpointing.local-copy.enabled'. Should we also
> add
> > > > > > "enabled"
> > > > > > > suffix for other boolean type configuration options ? For
> > example,
> > > > > > > execution.checkpointing.incremental ->
> > > > > > > execution.checkpointing.incremental.enabled
> > > > > > >
> > > > > > > In this way, the naming style of configuration options is
> > unified, and
> > > > > it
> > > > > > > can avoid potential similar problems (for example, we may need
> > to add
> > > > > > more
> > > > > > > options for incremental checkpoint in the future).
> > > > > > >
> > > > > > > Best,
> > > > > > > Lijie
> > > > > > >
> > > > > > > Yanfei Lei <fredia...@gmail.com> 于2023年12月26日周二 12:05写道:
> > > > > > >
> > > > > > > > Hi Zakelly,
> > > > > > > >
> > > > > > > > Thank you for creating the FLIP and starting the discussion.
> > > > > > > >
> > > > > > > > The current arrangement of these options is indeed somewhat
> > > > > haphazard,
> > > > > > > > and the new arrangement looks much better. I have some
> > questions
> > > > > about
> > > > > > > > the arrangement of some new configuration options:
> > > > > > > >
> > > > > > > > 1. For some state backends that do not support incremental
> > > > > checkpoint,
> > > > > > > > how does the execution.checkpointing.incrementaloption take
> > effect?
> > > > > Or
> > > > > > > > is it better to put incremental under
> > state.backend.xxx.incremental?
> > > > > > > >
> > > > > > > > 2. I'm a little worried that putting all configurations into
> > > > > > > > `ExecutionCheckpointingOptions` will introduce some
> dependency
> > > > > > > > problems. Some options would be used by flink-runtime module,
> > but
> > > > > > > > flink-runtime should not depend on flink-streaming-java. e.g.
> > > > > > > > FLINK-28286[1].
> > > > > > > > So, I prefer to move configurations to
> `CheckpointingOptions`,
> > WDYT?
> > > > > > > >
> > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-28286
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best,
> > > > > > > > Yanfei
> > > > > > > >
> > > > > > > > Zakelly Lan <zakelly....@gmail.com> 于2023年12月25日周一 21:14写道:
> > > > > > > > >
> > > > > > > > > Hi Rui Fan and Junrui,
> > > > > > > > >
> > > > > > > > > Thanks for the reminder! I agree to change the
> > > > > > > > > 'execution.checkpointing.local-copy' to
> > > > > > > > > 'execution.checkpointing.local-copy.enabled'.
> > > > > > > > >
> > > > > > > > > And for other suggestions Rui proposed:
> > > > > > > > >
> > > > > > > > > 1. How about execution.checkpointing.storage.type instead
> > > > > > > > > > of execution.checkpointing.storage?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Ah, I missed something here. Actually I suggest we could
> > merge the
> > > > > > > > current
> > > > > > > > > 'state.checkpoints.dir' and 'state.checkpoint-storage' into
> > one URI
> > > > > > > > > configuration named 'execution.checkpointing.dir'. WDYT?
> > > > > > > > >
> > > > > > > > > 3. execution.checkpointing.savepoint.dir is a little weird.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yes, I think it is better to make 'savepoint' and
> > 'checkpoint' the
> > > > > > same
> > > > > > > > > level. But I'm not so sure since there is only one
> > > > > savepoint-related
> > > > > > > > > option. Maybe someone else could share some thoughts here.
> > > > > > > > >
> > > > > > > > > 4. How about execution.recovery.claim-mode instead of
> > > > > > > > > > execution.recovery.mode?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >  Agreed. That's more accurate.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Many thanks for your suggestions!
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Zakelly
> > > > > > > > >
> > > > > > > > > On Mon, Dec 25, 2023 at 8:18 PM Junrui Lee <
> > jrlee....@gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Zakelly,
> > > > > > > > > >
> > > > > > > > > > Thanks for driving this. I agree that the proposed
> > restructuring
> > > > > of
> > > > > > > the
> > > > > > > > > > configuration options is largely positive. It will make
> > > > > > understanding
> > > > > > > > and
> > > > > > > > > > working with Flink configurations more intuitive.
> > > > > > > > > >
> > > > > > > > > > Most of the proposed changes look great. Just a heads-up,
> > as Rui
> > > > > > Fan
> > > > > > > > > > mentioned, Flink currently requires that no
> configOption's
> > key be
> > > > > > the
> > > > > > > > > > prefix of another to avoid issues when we eventually
> adopt
> > a
> > > > > > standard
> > > > > > > > YAML
> > > > > > > > > > parser, as detailed in FLINK-29372 (
> > > > > > > > > > https://issues.apache.org/jira/browse/FLINK-29372).
> > Therefore,
> > > > > > it's
> > > > > > > > better
> > > > > > > > > > to change the key 'execution.checkpointing.local-copy'
> > because it
> > > > > > > > serves as
> > > > > > > > > > a prefix to the key
> > 'execution.checkpointing.local-copy.dir'.
> > > > > > > > > >
> > > > > > > > > > Best regards,
> > > > > > > > > > Junrui
> > > > > > > > > >
> > > > > > > > > > Rui Fan <1996fan...@gmail.com> 于2023年12月25日周一 19:11写道:
> > > > > > > > > >
> > > > > > > > > > > Hi Zakelly,
> > > > > > > > > > >
> > > > > > > > > > > Thank you for driving this proposal!
> > > > > > > > > > >
> > > > > > > > > > > Overall good for me. I have some questions about these
> > names.
> > > > > > > > > > >
> > > > > > > > > > > 1. How about execution.checkpointing.storage.type
> > instead of
> > > > > > > > > > > execution.checkpointing.storage?
> > > > > > > > > > >
> > > > > > > > > > > It's similar to state.backend.type.
> > > > > > > > > > >
> > > > > > > > > > > 2. How about execution.checkpointing.local-copy.enabled
> > instead
> > > > > > of
> > > > > > > > > > > execution.checkpointing.local-copy?
> > > > > > > > > > >
> > > > > > > > > > > You added a new option:
> > execution.checkpointing.local-copy.dir.
> > > > > > > > > > > IIUC, one option name shouldn't be the prefix of other
> > options.
> > > > > > > > > > > If you add a new option
> > execution.checkpointing.local-copy,
> > > > > > > > > > > flink CI will fail directly.
> > > > > > > > > > >
> > > > > > > > > > > 3. execution.checkpointing.savepoint.dir is a little
> > weird.
> > > > > > > > > > >
> > > > > > > > > > > For old options: state.savepoints.dir and
> > > > > state.checkpoints.dir,
> > > > > > > > > > > the savepoint and checkpoint are the same level. It
> means
> > > > > > > > > > > it's a checkpoint or savepoint.
> > > > > > > > > > >
> > > > > > > > > > > The new option execution.checkpointing.dir is fine for
> > me.
> > > > > > > > > > > However, execution.checkpointing.savepoint.dir is a
> > little
> > > > > weird.
> > > > > > > > > > > I don't know which name is better now. Let us think
> > about it
> > > > > > more.
> > > > > > > > > > >
> > > > > > > > > > > 4. How about execution.recovery.claim-mode instead of
> > > > > > > > > > > execution.recovery.mode?
> > > > > > > > > > >
> > > > > > > > > > > The meaning of mode is too broad. The claim-mode may
> > > > > > > > > > > be more accurate for users.
> > > > > > > > > > >
> > > > > > > > > > > WDYT?
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Rui
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Dec 25, 2023 at 5:14 PM Zakelly Lan <
> > > > > > zakelly....@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi devs,
> > > > > > > > > > > >
> > > > > > > > > > > > I'd like to start a discussion on FLIP-406:
> Reorganize
> > State
> > > > > &
> > > > > > > > > > > > Checkpointing & Recovery Configuration[1].
> > > > > > > > > > > >
> > > > > > > > > > > > Currently, the configuration options pertaining to
> > > > > > checkpointing,
> > > > > > > > > > > recovery,
> > > > > > > > > > > > and state management are primarily grouped under the
> > > > > following
> > > > > > > > > > prefixes:
> > > > > > > > > > > >
> > > > > > > > > > > >    - state.backend.* : configurations related to
> state
> > > > > > accessing
> > > > > > > > and
> > > > > > > > > > > >    checkpointing, as well as specific options for
> > individual
> > > > > > > state
> > > > > > > > > > > backends
> > > > > > > > > > > >    - execution.checkpointing.* : configurations
> > associated
> > > > > with
> > > > > > > > > > > checkpoint
> > > > > > > > > > > >    execution and recovery
> > > > > > > > > > > >    - execution.savepoint.*: configurations for
> > recovery from
> > > > > > > > savepoint
> > > > > > > > > > > >
> > > > > > > > > > > > In addition, there are several individual options
> such
> > as '
> > > > > > > > > > > > *state.checkpoint-storage*' and
> > '*state.checkpoints.dir*'
> > > > > that
> > > > > > > fall
> > > > > > > > > > > outside
> > > > > > > > > > > > of these prefixes. The current arrangement of these
> > options,
> > > > > > > which
> > > > > > > > span
> > > > > > > > > > > > multiple modules, is somewhat haphazard and lacks a
> > > > > systematic
> > > > > > > > > > structure.
> > > > > > > > > > > > For example, the options under the
> > '*CheckpointingOptions*'
> > > > > > and '
> > > > > > > > > > > > *ExecutionCheckpointingOptions*' are related and have
> > no
> > > > > clear
> > > > > > > > > > boundaries
> > > > > > > > > > > > from the user's perspective, but there is no unified
> > prefix
> > > > > for
> > > > > > > > them.
> > > > > > > > > > > With
> > > > > > > > > > > > the upcoming release of Flink 2.0, we have an
> excellent
> > > > > > > > opportunity to
> > > > > > > > > > > > overhaul and restructure the configurations related
> to
> > > > > > > > checkpointing,
> > > > > > > > > > > > recovery, and state management. This FLIP proposes to
> > > > > > reorganize
> > > > > > > > these
> > > > > > > > > > > > settings, making it more coherent by module, which
> > would
> > > > > > > > significantly
> > > > > > > > > > > > lower the barriers for understanding and reduce the
> > > > > development
> > > > > > > > costs
> > > > > > > > > > > > moving forward.
> > > > > > > > > > > >
> > > > > > > > > > > > Looking forward to hearing from you!
> > > > > > > > > > > >
> > > > > > > > > > > > [1]
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=284789560
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Zakelly
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> >
>

Reply via email to