[ 
https://issues.apache.org/jira/browse/FLINK-6328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16019720#comment-16019720
 ] 

ASF GitHub Bot commented on FLINK-6328:
---------------------------------------

GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/3965

    [FLINK-6328] [chkPts] Don't add savepoints to CompletedCheckpointStore

    The lifecycle of savepoints is not managed by the CheckpointCoordinator and 
fully
    in the hand of the user. Therefore, the CheckpointCoordinator cannot rely 
on them
    when trying to recover from failures. E.g. a user moving a savepoint 
shortly before
    a failure could completely break Flink's recovery mechanism because Flink 
cannot
    skip failed checkpoints when recovering.
    
    Therefore, until Flink is able to skip failed checkpoints when recovering, 
we should
    not add savepoints to the CompletedCheckpointStore which is used to 
retrieve checkpoint
    for recovery. The distinction of a savepoint is done on the basis of the
    CheckpointProperties (CheckpointProperties.STANDARD_SAVEPOINT).
    
    cc @rmetzger 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink fixSavepointHandling

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3965.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3965
    
----
commit 9c069ad80d66f03a0f90c8ba1a780cbba111896e
Author: Till Rohrmann <trohrm...@apache.org>
Date:   2017-05-22T15:41:14Z

    [FLINK-6328] [chkPts] Don't add savepoints to CompletedCheckpointStore
    
    The lifecycle of savepoints is not managed by the CheckpointCoordinator and 
fully
    in the hand of the user. Therefore, the CheckpointCoordinator cannot rely 
on them
    when trying to recover from failures. E.g. a user moving a savepoint 
shortly before
    a failure could completely break Flink's recovery mechanism because Flink 
cannot
    skip failed checkpoints when recovering.
    
    Therefore, until Flink is able to skip failed checkpoints when recovering, 
we should
    not add savepoints to the CompletedCheckpointStore which is used to 
retrieve checkpoint
    for recovery. The distinction of a savepoint is done on the basis of the
    CheckpointProperties (CheckpointProperties.STANDARD_SAVEPOINT).

----


> Savepoints must not be counted as retained checkpoints
> ------------------------------------------------------
>
>                 Key: FLINK-6328
>                 URL: https://issues.apache.org/jira/browse/FLINK-6328
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.2.0, 1.3.0, 1.4.0
>            Reporter: Stephan Ewen
>            Assignee: Till Rohrmann
>            Priority: Blocker
>             Fix For: 1.3.0, 1.2.2
>
>
> The Checkpoint Store retains the *n* latest checkpoints.
> Savepoints are counted as well, meaning that for settings with 1 retained 
> checkpoint, there are sometimes no retained checkpoints at all, only a 
> savepoint.
> That is dangerous, because savepoints must be assumed to disappear at any 
> point in time - their lifecycle is out of control of the 
> CheckpointCoordinator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to