Hi Seth Thanks for bringing this discussion, and I really like this refactor to give more cleaner concepts!
When we talk about the relationship between state, state backends, and snapshots. The 'CheckpointStorage' only focus on how to persist the checkpointed state (to JM or to DFS), there still exist some concepts related to the implementation of state backend, e.g. the configuration of 'state.backend.incremental' and 'state.backend.async'. What do you think of these configurations? They're still related with checkpointing but limits to the feature of state backend. Moreover, when talk about separating state backend from checkpointing, I also want to give another two cents here: state backend which holds the state is must-to-have when we use state in streaming job, however, checkpointing is not a must-to-have if we do not enable the checkpointing. And 'ExecutionGraph' could live without checkpointCoordinator on JM side while 'StreamTask' always initialize the 'checkpointStorage' on task side. I think JM knows the inner relationship between state backend and checkpoint while TM seems mix them together. Best Yun Tang ________________________________ From: Konstantin Knauf <kna...@apache.org> Sent: Wednesday, September 9, 2020 16:05 To: dev <dev@flink.apache.org> Subject: Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing Thanks for the initiative. Big +1. Would be interested to hear if the proposed interfaces still make sense in the face of the new fault-tolerance work that is planned. Stephan/Piotr will know. On Tue, Sep 8, 2020 at 7:05 PM Seth Wiesman <sjwies...@gmail.com> wrote: > Hi Devs, > > I'd like to propose an update to how state backends and checkpoint storage > are configured to help users better understand Flink. > > Apache Flink's durability story is a mystery to many users. One of the most > common recurring questions from users comes from not understanding the > relationship between state, state backends, and snapshots. Some of this > confusion can be abated with learning material but the question is so > pervasive that we believe Flinkās user APIs should be better communicate > what different components are responsible for. > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-142%3A+Disentangle+StateBackends+from+Checkpointing > > > I look forward to a healthy discussion. > > > Seth > -- Konstantin Knauf https://twitter.com/snntrable https://github.com/knaufk