Re: [DISCUSS] FLIP-41: Unify Keyed State Snapshot Binary Format for Savepoints

Tzu-Li (Gordon) Tai Sun, 16 Jun 2019 19:15:55 -0700

Thanks for the inputs Yu and Aljoscha!

I agree to rename this FLIP. Will call it "Unified binary format for Keyed
State".


I will proceed to open a VOTE thread to formally adopt the FLIP now.

On Fri, Jun 14, 2019 at 10:03 PM Aljoscha Krettek <aljos...@apache.org>
wrote:

> Please also see my comment on
> https://issues.apache.org/jira/browse/FLINK-12619?focusedCommentId=16864098&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16864098
> <
> https://issues.apache.org/jira/browse/FLINK-12619?focusedCommentId=16864098&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16864098
> >
>
> For this FLIP-41 it means we go forward with the design basically as is
> but should call it “Unified Format” or something like it.
>
> If no-one else comments, we should proceed to a [VOTE] thread to formally
> adopt the FLIP.
>
> Aljoscha
>
> > On 14. Jun 2019, at 15:40, Yu Li <l...@apache.org> wrote:
> >
> > Hi Aljoscha and all,
> >
> > My 2 cents here:
> >
> > 1. Conceptually it worth a second thought about introducing an optimized
> > snapshot format for now (i.e. use checkpoint format in savepoint), just
> > like it's not recommended to use snapshot for backup in database
> (although
> > practically it could be implemented).
> >
> > 2. Stop-with-checkpoint mechanism is like stopping database instance
> with a
> > data flush, thus (IMHO) a different story from the checkpoint/savepoint
> (db
> > snapshot/backup) diversity.
> >
> > 3. In the long run we may improve the checkpoint to allow a short enough
> > interval thus it may become some format of transactional log, then we
> could
> > enable checkpoint-based savepoint (like transactional log based backup),
> so
> > I agree to still call the new format in FLIP-41 a "Unified Format"
> although
> > in the short term it only unifies savepoint.
> >
> > I've also wrote a document [1] to include more details and please refer
> to
> > it if interested. Thanks!
> >
> > [1] https://docs.google.com/document/d/1uE4R3wNal6e67FkDe0UvcnsIMMDpr35j
> >
> > Best Regards,
> > Yu
> >
> >
> > On Thu, 6 Jun 2019 at 19:42, Aljoscha Krettek <aljos...@apache.org>
> wrote:
> >
> >> Btw, I think this FLIP is a very good effort, we just need to reframe
> the
> >> effort a tiny bit. +1
> >>
> >>> On 6. Jun 2019, at 13:41, Aljoscha Krettek <aljos...@apache.org>
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I had a brief discussion with Stephan that helped me sort my thoughts
> on
> >> the broader topics of checkpoints, savepoints, binary formats,
> >> user-triggered checkpoints, and periodic savepoints. I’ll try to
> summarise
> >> my stance on this and also comment with the same message on the other
> >> relevant Jira Issues and threads.
> >>>
> >>> For reference, the relevant FLIP and Jira issues are these:
> >>>
> >>> -
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints
> :
> >> <
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-41:+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints
> :>
> >> Unified Savepoint Format
> >>> - https://issues.apache.org/jira/browse/FLINK-12619: Add support for
> >> stop-with-checkpoint
> >>> - https://issues.apache.org/jira/browse/FLINK-6755: User-triggered
> >> checkpoints
> >>> - https://issues.apache.org/jira/browse/FLINK-4620: Automatically
> >> creating savepoints
> >>> - https://issues.apache.org/jira/browse/FLINK-4511: Schedule periodic
> >> savepoints
> >>>
> >>> There are roughly two different dimensions in the topic of
> >> savepoints/checkpoints (I’ll use snapshot as the generic term for both):
> >>> 1) who controls the snapshot
> >>> 2) what’s the (binary) format of the snapshot
> >>>
> >>> For 1), we currently have checkpoints and savepoints. Checkpoints are
> >> created by the system for fault tolerance. They are managed by the
> system
> >> and the system is free to discard them when it sees fit. Savepoints are
> in
> >> the control of the user. A user can choose to create a save point, they
> can
> >> delete them, they can restore from them at will. The system will not
> clean
> >> up savepoints. We should try and keep this separation and not muddle the
> >> two concepts.
> >>>
> >>> For 2), we currently have various different formats between the
> >> different state backends and also for the same backend. I.e. RocksDB
> can do
> >> full or incremental snapshots, local snapshots, and probably more.
> >>>
> >>> FLIP-41 aims at introducing a unified “savepoint" format that is
> >> interchangeable between the different state backends. In light of the
> above
> >> points, we should say that FLIP-41 aims to introduce a canonical format
> >> that is interchangeable between different backends. This doesn’t mean
> that
> >> we should tie this format strictly to savepoints, though. For
> performance
> >> reasons, users might choose to do savepoints that use one of the
> optimised
> >> formats that the backends offer, for example incremental snapshots. Or
> they
> >> might choose to use the canonical format for regular checkpoints so that
> >> they can always switch between backends using periodically created
> >> externalised checkpoints.
> >>>
> >>> The motivation behind FLINK-12619 is to have a more lightweight
> >> alternative for stop-with-savepoint, for example using the incremental
> >> snapshot format that RocksDB has. With the above in mind, however, this
> >> becomes “Add support for choosing the snapshot format for
> >> stop-with-savepoint”. It should not be stop-with-checkpoint, because
> >> checkpoints are something that the system manages and not something that
> >> the user should trigger. The same is true for FLINK-6755, the
> motivation is
> >> the same I think. The change should be called “Add support for choosing
> the
> >> snapshot format for savepoints”, however.
> >>>
> >>> For the last two Jira issues mentioned above it should be quite clear
> >> what I think. I do, however, see a need for potentially different
> >> overlapping checkpoint periods or intervals. Users might want to have
> their
> >> regular checkpoints use an optimised format but they also want to have a
> >> “canonical format” checkpoint every no and then so that the lineage of
> >> incremental checkpoints does not become too unwieldy.
> >>>
> >>> Please let me know what you think!
> >>>
> >>> Aljoscha
> >>>
> >>>> On 5. Jun 2019, at 10:36, Tzu-Li (Gordon) Tai <tzuli...@apache.org>
> >> wrote:
> >>>>
> >>>> I want to quickly bump this discussion to gather more consensus from
> >> others
> >>>> on the FLIP, and see if we want to aim this for the upcoming 1.9.0
> >> release.
> >>>> The proposal touches binary formats of savepoints, which is a major
> >> part of
> >>>> Flink's public user interface, so having explicit approval from other
> >>>> members of the community would be nice here.
> >>>>
> >>>> Cheers,
> >>>> Gordon
> >>>>
> >>>> On Wed, May 29, 2019 at 11:45 AM Tzu-Li (Gordon) Tai <
> >> tzuli...@apache.org>
> >>>> wrote:
> >>>>
> >>>>> I also should point out something that I forgot to mention in the
> >> initial
> >>>>> post:
> >>>>> Stefan has helped a lot in understanding the current status of state
> >>>>> backends and also participated a lot in design choices for the FLIP
> :)
> >>>>>
> >>>>> On Wed, May 29, 2019 at 5:02 AM Tzu-Li (Gordon) Tai <
> >> tzuli...@apache.org>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Flink devs,
> >>>>>>
> >>>>>> Congxian, Kostas, and I have recently been discussing to unify the
> >> binary
> >>>>>> formats for keyed state in savepoints, which would allow for more
> >>>>>> operational flexibility such as swapping state backends across
> >> restores.
> >>>>>>
> >>>>>> As part of this FLIP, another main proposal is to start allowing
> >>>>>> checkpoints and savepoints to have different formats. Savepoint
> >> formats
> >>>>>> should in the future be designed with interoperability in mind and
> >>>>>> reasonable snapshot / restore overhead is tolerable, while
> >> checkpoints are
> >>>>>> allowed to be backend specific for more efficient snapshots and
> >> restores.
> >>>>>> From recent proposals in the state backends such as disk-spilling
> heap
> >>>>>> backend [1], this flexibility seems to be reasonable.
> >>>>>>
> >>>>>> The main user-facing API this would affect is of course, the binary
> >>>>>> formats of savepoints, as well as the fact that we will no longer be
> >>>>>> guaranteeing functional parity between savepoints and full
> >> checkpoints in
> >>>>>> the future (w.r.t. operational features related to upgrading
> >> applications;
> >>>>>> so far they have equal functionality).
> >>>>>>
> >>>>>> Therefore, we would like to collect feedback on the proposal before
> >>>>>> continuing efforts.
> >>>>>>
> >>>>>> This is the FLIP:
> >>>>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints
> >>>>>> .
> >>>>>>
> >>>>>> I'm happy to discuss details and looking forward to any feedback.
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Gordon
> >>>>>>
> >>>>>> [1]
> >>>>>>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Proposal-to-support-disk-spilling-in-HeapKeyedStateBackend-td29109.html
> >>>>>>
> >>>>>
> >>>
> >>
> >>
>
>

Re: [DISCUSS] FLIP-41: Unify Keyed State Snapshot Binary Format for Savepoints

Reply via email to