[ https://issues.apache.org/jira/browse/FLINK-12619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857747#comment-16857747 ]
Aljoscha Krettek commented on FLINK-12619: ------------------------------------------ The documentation states what I also described: savepoints are user-controlled and checkpoints are system-controlled. That's why think efforts such as allowing user-triggered checkpoints (which includes a user-triggered stop-with-checkpoint) break that distinction between user control and system control. That being said, there are clear use cases where you want a more light-weight savepoint, and for those we should allow taking a savepoint in a more efficient format (i.e. the format that checkpoints normally use. The "canonical format" that (in my opinion) FLIP-41 will introduce can be used to create savepoints that are compatible between backends. What I'm saying, however, is that we should not strictly tie this to only savepoints. With the changes of FLIP-41, the default config for taking a savepoint might be "use the canonical but slow format" and the default for checkpoints might be "use the optimized (incremental) format. But users can choose to do a stop-with-savepoint using the optimized incremental format because they know that they don't want to switch to a different state backend and would like the speed and size benefit of the faster format. > Support TERMINATE/SUSPEND Job with Checkpoint > --------------------------------------------- > > Key: FLINK-12619 > URL: https://issues.apache.org/jira/browse/FLINK-12619 > Project: Flink > Issue Type: New Feature > Components: Runtime / State Backends > Reporter: Congxian Qiu(klion26) > Assignee: Congxian Qiu(klion26) > Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Inspired by the idea of FLINK-11458, we propose to support terminate/suspend > a job with checkpoint. This improvement cooperates with incremental and > external checkpoint features, that if checkpoint is retained and this feature > is configured, we will trigger a checkpoint before the job stops. It could > accelarate job recovery a lot since: > 1. No source rewinding required any more. > 2. It's much faster than taking a savepoint since incremental checkpoint is > enabled. > Please note that conceptually savepoints is different from checkpoint in a > similar way that backups are different from recovery logs in traditional > database systems. So we suggest using this feature only for job recovery, > while stick with FLINK-11458 for the > upgrading/cross-cluster-job-migration/state-backend-switch cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005)