Hey @Lihe Ma,

Thanks and I appreciate your feedback!

*What do I think of introducing a new state backend?*
I am hesitant to follow this way of introducing a new state backend to
implement `HashMapStateBackend with incremental checkpoint enabled`,
because:
- It causes more confusion for users: why incremental checkpoint can
be *enabled
through a flag for RocksDB* but having to *write a new Statebackend for
HashMap*?
- For the same reason above, this causes *inconsistent system behavior* and
potentially lead to more problems.
    For example, do we want to eventually keep both state backends (HashMap
and the newly introduced one) or deprecate one?
    How to deprecate one if both are widely used in prod, especially since
people can set up state backends in the job code, which means they have to
change/update their job code for depreciation?
- Most importantly, I do not think introducing a new state backend to
implement `HashMapStateBackend with incremental checkpoint enabled`
resolves the problem. I will explain that later.

Now let's come back to the original proposal
*Make incremental checkpoint as default for RocksDB?*
As I've already stated previously, I am +1 on this. And the reason is well
explained (the motivation,  backward compatibilities, and previous glitches
solved by FLIP 193 e.t.c) in the original proposal and other
threads/tickets.

However, there is a gap between the above goal and the way proposed to
achieve it: "set as the default state.backend.incremental = true". The way
proposed is to make incremental checkpoint default universally for all
state backends.

*Should we make incremental checkpoints as default universally* *for all
state backends?*
I am not sure about this, or at least before a few questions are
answered/discussed/solved.

1. Is incremental checkpoints as default needed for all state backends?
     Probably not, but if one state backend (like RocksDB) has strong
benefits, but others are neutral or non-neg effects, it is acceptable.
2. What is the impact of the default flag switch on the currently supported
state backends?
     RocksDB prefers incremental checkpoints, and HashMap (In Memory) does
not support incremental yet (no effect).
3. How about system extensibility and future migration?
    There is a problem here. Nowadays, HashMap (In Memory) state
backends are widely used in prod as well when the state size is small
enough (in most cases based on user feedback). The support for incremental
checkpoints on HashMapStatebackend is asked by users. Then think about
this: if we set default state.backend.incremental = true, and launch the
incremental checkpoint feature for HashMapStatebackend, there is a big
upgrading cost for existing users that uses HashMapStatebackend.
     Introducing a new state backend does not solve the problem of "system
extensibility". It introduces new problems like "deprecating and migration"
as stated above.

*What is my suggestion?*
- Design a way to support different default configurations for different
state backends (I do not think it is difficult to do).
- Have a better way to solve the "system extensibility" problem.

Best,
Yuan


On Fri, Jun 17, 2022 at 11:49 AM LuNing Wang <wang4lun...@gmail.com> wrote:

> Strongly +1
>
> Best,
> LuNing Wang
>
> Zakelly Lan <zakelly....@gmail.com> 于2022年6月17日周五 00:15写道:
>
> > Thanks for bringing this up.
> > I'm +1 on enabling the incremental checkpoint  by default on RocksDB.
> But I
> > also agree with Yuan about not enabling this on newly implemented
> > incremental checkpoint for hashmap statebackend.
> > I am wondering can we make it behave differently for different state
> > backends when ```state.backend.incremental``` is not set? Although it may
> > bring some confusion in the default behaviour.
> >
> > On Thu, Jun 16, 2022 at 10:18 PM Lihe Ma <ma_l...@163.com> wrote:
> >
> > > Thanks for the suggestion, Yuan.
> > >
> > >
> > > How about naming the newly in-memory state-backend, which supports
> > > incremental checkpoint, as HeapStateBackend . And let the default
> > > state-backend still stay as HashMapStateBackend.
> > > By doing so, we can:
> > > 1) the default value of parameter state.backend.incremental could be
> set
> > > as true safely, which is so widely accepted according to the replies in
> > > this thread.
> > > 2) your newly in-memory state-backend could also be merged and users
> who
> > > want to try in-memory incremental checkpoints could also have a
> solution.
> > > We can also set the newly HeapStateBackend as default state-backend in
> > the
> > > future if the feature of incremental checkpoint is stable. What do you
> > > think?
> > >
> > >
> > > Best,
> > > Lihe Ma
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > 在 2022-06-15 21:03:31,"Feifan Wang" <zoltar9...@163.com> 写道:
> > > >Thanks for bringing this up.
> > > >Strongly +1
> > > >
> > > >
> > > >
> > > >
> > > >——————————————
> > > >Name: Feifan Wang
> > > >Email: zoltar9...@163.com
> > > >
> > > >
> > > >---- Replied Message ----
> > > >| From | Yuan Mei<yuanmei.w...@gmail.com> |
> > > >| Date | 06/15/2022 11:41 |
> > > >| To | dev<dev@flink.apache.org> ,
> > > ><ro...@ververica.com> |
> > > >| Subject | Re: [DISCUSS ] Make state.backend.incremental as true by
> > > default |
> > > >Thanks for bringing this up.
> > > >
> > > >I am +1 on making incremental checkpoints by default for RocksDB, but
> > not
> > > >universally for all state backends.
> > > >
> > > >Besides being widely used in prod, enabling incremental checkpoint for
> > > >RocksDB by default is also a pre-requisite when enabling task-local by
> > > >default FLINK-15507 <
> https://issues.apache.org/jira/browse/FLINK-15507>
> > > >
> > > >The incremental checkpoint for the hashmap statebackend is under
> review
> > > >right now. CC @ro...@ververica.com <ro...@ververica.com> , which is
> > not a
> > > >good idea being enabled by default in the first version.
> > > >
> > > >Best,
> > > >
> > > >Yuan
> > > >
> > > >On Tue, Jun 14, 2022 at 7:33 PM Jiangang Liu <
> liujiangangp...@gmail.com
> > >
> > > >wrote:
> > > >
> > > >+1 for the suggestion. We have use the incremental checkpoint in our
> > > >production for a long time.
> > > >
> > > >Hangxiang Yu <master...@gmail.com> 于2022年6月14日周二 15:41写道:
> > > >
> > > >+1
> > > >It's basically enabled in most scenarios in production environments.
> > > >For HashMapStateBackend, it will adopt a full checkpoint even if we
> > > >enable
> > > >incremental checkpoint. It will also support incremental checkpoint
> > after
> > > >[1]. It's compatible.
> > > >BTW, I think we may also need to improve the documentation of
> > incremental
> > > >checkpoints which users usually ask. There are some tickets like
> [2][3].
> > > >
> > > >Best,
> > > >Hangxiang.
> > > >
> > > >[1] https://issues.apache.org/jira/browse/FLINK-21648
> > > >[2] https://issues.apache.org/jira/browse/FLINK-22797
> > > >[3] https://issues.apache.org/jira/browse/FLINK-7449
> > > >
> > > >On Mon, Jun 13, 2022 at 7:48 PM Rui Fan <1996fan...@gmail.com> wrote:
> > > >
> > > >Strongly +1
> > > >
> > > >Best,
> > > >Rui Fan
> > > >
> > > >On Mon, Jun 13, 2022 at 7:35 PM Martijn Visser <
> > > >martijnvis...@apache.org
> > > >
> > > >wrote:
> > > >
> > > >BTW, from my knowledge, nothing would happen for
> > > >HashMapStateBackend,
> > > >which does not support incremental checkpoint yet, when enabling
> > > >incremental checkpoints.
> > > >
> > > >Thanks Yun, if no errors would occur then definitely +1 to enable it
> > > >by
> > > >default
> > > >
> > > >Op ma 13 jun. 2022 om 12:42 schreef Alexander Fedulov <
> > > >alexan...@ververica.com>:
> > > >
> > > >+1
> > > >
> > > >From my experience, it is actually hard to come up with use cases
> > > >where
> > > >incremental checkpoints should explicitly not be enabled with the
> > > >RocksDB
> > > >state backend. If the state is so small that the full snapshots do
> > > >not
> > > >have any negative impact, one should consider using
> > > >HashMapStateBackend
> > > >anyway.
> > > >
> > > >Best,
> > > >Alexander Fedulov
> > > >
> > > >
> > > >On Mon, Jun 13, 2022 at 12:26 PM Jing Ge <j...@ververica.com>
> > > >wrote:
> > > >
> > > >+1
> > > >
> > > >Glad to see the kickoff of this discussion. Thanks Lihe for
> > > >driving
> > > >this!
> > > >
> > > >We have actually already discussed it internally a few months
> > > >ago.
> > > >After
> > > >considering some corner cases, all agreed on enabling the
> > > >incremental
> > > >checkpoint as default.
> > > >
> > > >Best regards,
> > > >Jing
> > > >
> > > >On Mon, Jun 13, 2022 at 12:17 PM Yun Tang <myas...@live.com>
> > > >wrote:
> > > >
> > > >Strongly +1 for making incremental checkpoints as default. Many
> > > >users
> > > >have
> > > >ever been asking why this configuration is not enabled by
> > > >default.
> > > >
> > > >BTW, from my knowledge, nothing would happen for
> > > >HashMapStateBackend,
> > > >which does not support incremental checkpoint yet, when
> > > >enabling
> > > >incremental checkpoints.
> > > >
> > > >
> > > >Best
> > > >Yun Tang
> > > >________________________________
> > > >From: Martijn Visser <martijnvis...@apache.org>
> > > >Sent: Monday, June 13, 2022 18:05
> > > >To: dev@flink.apache.org <dev@flink.apache.org>
> > > >Subject: Re: [DISCUSS ] Make state.backend.incremental as true
> > > >by
> > > >default
> > > >
> > > >Hi Lihe,
> > > >
> > > >What happens if we enable incremental checkpoints by default
> > > >while
> > > >the
> > > >used
> > > >memory backend is HashMapStateBackend, which doesn't support
> > > >incremental
> > > >checkpoints?
> > > >
> > > >Best regards,
> > > >
> > > >Martijn
> > > >
> > > >Op ma 13 jun. 2022 om 11:59 schreef Lihe Ma <ma_l...@163.com>:
> > > >
> > > >Hi, Everyone,
> > > >
> > > >I would like to open a discussion on setting incremental
> > > >checkpoint
> > > >as
> > > >default behavior.
> > > >
> > > >Currently, the configuration of state.backend.incremental is
> > > >set
> > > >as
> > > >false
> > > >by default. Incremental checkpoint has been adopted widely in
> > > >industry
> > > >community for many years , and it is also well-tested from
> > > >the
> > > >feedback
> > > >in
> > > >the community discussion. Incremental checkpointing is more
> > > >light-weighted:
> > > >shorter checkpoint duration, less uploaded data and less
> > > >resource
> > > >consumption.
> > > >
> > > >In terms of backward compatibility, enable incremental
> > > >checkpointing
> > > >would
> > > >not make any data loss no matter restoring from a full
> > > >checkpoint/savepoint
> > > >or an incremental checkpoint.
> > > >
> > > >FLIP-193 (Snapshot ownership)[1] has been released in 1.15,
> > > >incremental
> > > >checkpoint no longer depends on a previous restored
> > > >checkpoint
> > > >in
> > > >default
> > > >NO_CLAIM mode, which makes the checkpoint lineage much
> > > >cleaner,
> > > >it
> > > >is a
> > > >good chance to change the configuration
> > > >state.backend.incremental
> > > >to
> > > >true
> > > >as default.
> > > >
> > > >Thus, based on the above discussion, I suggest to make
> > > >state.backend.incremental as true by default. What do you
> > > >think
> > > >of
> > > >this
> > > >proposal?
> > > >
> > > >[1]
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-193%3A+Snapshots+ownership
> > > >
> > > >Best regards,
> > > >Lihe Ma
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>

Reply via email to