[ https://issues.apache.org/jira/browse/FLINK-6328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16019720#comment-16019720 ]
ASF GitHub Bot commented on FLINK-6328: --------------------------------------- GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/3965 [FLINK-6328] [chkPts] Don't add savepoints to CompletedCheckpointStore The lifecycle of savepoints is not managed by the CheckpointCoordinator and fully in the hand of the user. Therefore, the CheckpointCoordinator cannot rely on them when trying to recover from failures. E.g. a user moving a savepoint shortly before a failure could completely break Flink's recovery mechanism because Flink cannot skip failed checkpoints when recovering. Therefore, until Flink is able to skip failed checkpoints when recovering, we should not add savepoints to the CompletedCheckpointStore which is used to retrieve checkpoint for recovery. The distinction of a savepoint is done on the basis of the CheckpointProperties (CheckpointProperties.STANDARD_SAVEPOINT). cc @rmetzger You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink fixSavepointHandling Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3965.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3965 ---- commit 9c069ad80d66f03a0f90c8ba1a780cbba111896e Author: Till Rohrmann <trohrm...@apache.org> Date: 2017-05-22T15:41:14Z [FLINK-6328] [chkPts] Don't add savepoints to CompletedCheckpointStore The lifecycle of savepoints is not managed by the CheckpointCoordinator and fully in the hand of the user. Therefore, the CheckpointCoordinator cannot rely on them when trying to recover from failures. E.g. a user moving a savepoint shortly before a failure could completely break Flink's recovery mechanism because Flink cannot skip failed checkpoints when recovering. Therefore, until Flink is able to skip failed checkpoints when recovering, we should not add savepoints to the CompletedCheckpointStore which is used to retrieve checkpoint for recovery. The distinction of a savepoint is done on the basis of the CheckpointProperties (CheckpointProperties.STANDARD_SAVEPOINT). ---- > Savepoints must not be counted as retained checkpoints > ------------------------------------------------------ > > Key: FLINK-6328 > URL: https://issues.apache.org/jira/browse/FLINK-6328 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing > Affects Versions: 1.2.0, 1.3.0, 1.4.0 > Reporter: Stephan Ewen > Assignee: Till Rohrmann > Priority: Blocker > Fix For: 1.3.0, 1.2.2 > > > The Checkpoint Store retains the *n* latest checkpoints. > Savepoints are counted as well, meaning that for settings with 1 retained > checkpoint, there are sometimes no retained checkpoints at all, only a > savepoint. > That is dangerous, because savepoints must be assumed to disappear at any > point in time - their lifecycle is out of control of the > CheckpointCoordinator. -- This message was sent by Atlassian JIRA (v6.3.15#6346)