Thanks for the inputs Yu and Aljoscha! I agree to rename this FLIP. Will call it "Unified binary format for Keyed State".
I will proceed to open a VOTE thread to formally adopt the FLIP now. On Fri, Jun 14, 2019 at 10:03 PM Aljoscha Krettek <aljos...@apache.org> wrote: > Please also see my comment on > https://issues.apache.org/jira/browse/FLINK-12619?focusedCommentId=16864098&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16864098 > < > https://issues.apache.org/jira/browse/FLINK-12619?focusedCommentId=16864098&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16864098 > > > > For this FLIP-41 it means we go forward with the design basically as is > but should call it “Unified Format” or something like it. > > If no-one else comments, we should proceed to a [VOTE] thread to formally > adopt the FLIP. > > Aljoscha > > > On 14. Jun 2019, at 15:40, Yu Li <l...@apache.org> wrote: > > > > Hi Aljoscha and all, > > > > My 2 cents here: > > > > 1. Conceptually it worth a second thought about introducing an optimized > > snapshot format for now (i.e. use checkpoint format in savepoint), just > > like it's not recommended to use snapshot for backup in database > (although > > practically it could be implemented). > > > > 2. Stop-with-checkpoint mechanism is like stopping database instance > with a > > data flush, thus (IMHO) a different story from the checkpoint/savepoint > (db > > snapshot/backup) diversity. > > > > 3. In the long run we may improve the checkpoint to allow a short enough > > interval thus it may become some format of transactional log, then we > could > > enable checkpoint-based savepoint (like transactional log based backup), > so > > I agree to still call the new format in FLIP-41 a "Unified Format" > although > > in the short term it only unifies savepoint. > > > > I've also wrote a document [1] to include more details and please refer > to > > it if interested. Thanks! > > > > [1] https://docs.google.com/document/d/1uE4R3wNal6e67FkDe0UvcnsIMMDpr35j > > > > Best Regards, > > Yu > > > > > > On Thu, 6 Jun 2019 at 19:42, Aljoscha Krettek <aljos...@apache.org> > wrote: > > > >> Btw, I think this FLIP is a very good effort, we just need to reframe > the > >> effort a tiny bit. +1 > >> > >>> On 6. Jun 2019, at 13:41, Aljoscha Krettek <aljos...@apache.org> > wrote: > >>> > >>> Hi, > >>> > >>> I had a brief discussion with Stephan that helped me sort my thoughts > on > >> the broader topics of checkpoints, savepoints, binary formats, > >> user-triggered checkpoints, and periodic savepoints. I’ll try to > summarise > >> my stance on this and also comment with the same message on the other > >> relevant Jira Issues and threads. > >>> > >>> For reference, the relevant FLIP and Jira issues are these: > >>> > >>> - > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints > : > >> < > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-41:+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints > :> > >> Unified Savepoint Format > >>> - https://issues.apache.org/jira/browse/FLINK-12619: Add support for > >> stop-with-checkpoint > >>> - https://issues.apache.org/jira/browse/FLINK-6755: User-triggered > >> checkpoints > >>> - https://issues.apache.org/jira/browse/FLINK-4620: Automatically > >> creating savepoints > >>> - https://issues.apache.org/jira/browse/FLINK-4511: Schedule periodic > >> savepoints > >>> > >>> There are roughly two different dimensions in the topic of > >> savepoints/checkpoints (I’ll use snapshot as the generic term for both): > >>> 1) who controls the snapshot > >>> 2) what’s the (binary) format of the snapshot > >>> > >>> For 1), we currently have checkpoints and savepoints. Checkpoints are > >> created by the system for fault tolerance. They are managed by the > system > >> and the system is free to discard them when it sees fit. Savepoints are > in > >> the control of the user. A user can choose to create a save point, they > can > >> delete them, they can restore from them at will. The system will not > clean > >> up savepoints. We should try and keep this separation and not muddle the > >> two concepts. > >>> > >>> For 2), we currently have various different formats between the > >> different state backends and also for the same backend. I.e. RocksDB > can do > >> full or incremental snapshots, local snapshots, and probably more. > >>> > >>> FLIP-41 aims at introducing a unified “savepoint" format that is > >> interchangeable between the different state backends. In light of the > above > >> points, we should say that FLIP-41 aims to introduce a canonical format > >> that is interchangeable between different backends. This doesn’t mean > that > >> we should tie this format strictly to savepoints, though. For > performance > >> reasons, users might choose to do savepoints that use one of the > optimised > >> formats that the backends offer, for example incremental snapshots. Or > they > >> might choose to use the canonical format for regular checkpoints so that > >> they can always switch between backends using periodically created > >> externalised checkpoints. > >>> > >>> The motivation behind FLINK-12619 is to have a more lightweight > >> alternative for stop-with-savepoint, for example using the incremental > >> snapshot format that RocksDB has. With the above in mind, however, this > >> becomes “Add support for choosing the snapshot format for > >> stop-with-savepoint”. It should not be stop-with-checkpoint, because > >> checkpoints are something that the system manages and not something that > >> the user should trigger. The same is true for FLINK-6755, the > motivation is > >> the same I think. The change should be called “Add support for choosing > the > >> snapshot format for savepoints”, however. > >>> > >>> For the last two Jira issues mentioned above it should be quite clear > >> what I think. I do, however, see a need for potentially different > >> overlapping checkpoint periods or intervals. Users might want to have > their > >> regular checkpoints use an optimised format but they also want to have a > >> “canonical format” checkpoint every no and then so that the lineage of > >> incremental checkpoints does not become too unwieldy. > >>> > >>> Please let me know what you think! > >>> > >>> Aljoscha > >>> > >>>> On 5. Jun 2019, at 10:36, Tzu-Li (Gordon) Tai <tzuli...@apache.org> > >> wrote: > >>>> > >>>> I want to quickly bump this discussion to gather more consensus from > >> others > >>>> on the FLIP, and see if we want to aim this for the upcoming 1.9.0 > >> release. > >>>> The proposal touches binary formats of savepoints, which is a major > >> part of > >>>> Flink's public user interface, so having explicit approval from other > >>>> members of the community would be nice here. > >>>> > >>>> Cheers, > >>>> Gordon > >>>> > >>>> On Wed, May 29, 2019 at 11:45 AM Tzu-Li (Gordon) Tai < > >> tzuli...@apache.org> > >>>> wrote: > >>>> > >>>>> I also should point out something that I forgot to mention in the > >> initial > >>>>> post: > >>>>> Stefan has helped a lot in understanding the current status of state > >>>>> backends and also participated a lot in design choices for the FLIP > :) > >>>>> > >>>>> On Wed, May 29, 2019 at 5:02 AM Tzu-Li (Gordon) Tai < > >> tzuli...@apache.org> > >>>>> wrote: > >>>>> > >>>>>> Hi Flink devs, > >>>>>> > >>>>>> Congxian, Kostas, and I have recently been discussing to unify the > >> binary > >>>>>> formats for keyed state in savepoints, which would allow for more > >>>>>> operational flexibility such as swapping state backends across > >> restores. > >>>>>> > >>>>>> As part of this FLIP, another main proposal is to start allowing > >>>>>> checkpoints and savepoints to have different formats. Savepoint > >> formats > >>>>>> should in the future be designed with interoperability in mind and > >>>>>> reasonable snapshot / restore overhead is tolerable, while > >> checkpoints are > >>>>>> allowed to be backend specific for more efficient snapshots and > >> restores. > >>>>>> From recent proposals in the state backends such as disk-spilling > heap > >>>>>> backend [1], this flexibility seems to be reasonable. > >>>>>> > >>>>>> The main user-facing API this would affect is of course, the binary > >>>>>> formats of savepoints, as well as the fact that we will no longer be > >>>>>> guaranteeing functional parity between savepoints and full > >> checkpoints in > >>>>>> the future (w.r.t. operational features related to upgrading > >> applications; > >>>>>> so far they have equal functionality). > >>>>>> > >>>>>> Therefore, we would like to collect feedback on the proposal before > >>>>>> continuing efforts. > >>>>>> > >>>>>> This is the FLIP: > >>>>>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints > >>>>>> . > >>>>>> > >>>>>> I'm happy to discuss details and looking forward to any feedback. > >>>>>> > >>>>>> Cheers, > >>>>>> Gordon > >>>>>> > >>>>>> [1] > >>>>>> > >> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Proposal-to-support-disk-spilling-in-HeapKeyedStateBackend-td29109.html > >>>>>> > >>>>> > >>> > >> > >> > >