[ 
https://issues.apache.org/jira/browse/FLINK-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15387721#comment-15387721
 ] 

ASF GitHub Bot commented on FLINK-4201:
---------------------------------------

Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/2276
  
    I think that change is good in "our" philosophy, just wondering about the 
following implication:
    
    Some people want to run HA setups without ZooKeeper, simply using an 
external service to make sure that the JobManager is restarted. The latest 
completed checkpoint is found via a well-defined path in the checkpoint storage 
(rather than a well defined path in ZooKeeper).
    
    That works as an HA setup (with the exception of being susceptible to 
"split brain" behavior in the presence of network partitions. 
    
    Would we interfere with such a setup when removing checkpoints on "suspend" 
in "standalone" mode?
    
    BTW: We should really find a new name for "standalone" more ;-) The term is 
overloaded.


> Checkpoints for jobs in non-terminal state (e.g. suspended) get deleted
> -----------------------------------------------------------------------
>
>                 Key: FLINK-4201
>                 URL: https://issues.apache.org/jira/browse/FLINK-4201
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>            Reporter: Stefan Richter
>            Assignee: Ufuk Celebi
>            Priority: Blocker
>
> For example, when shutting down a Yarn session, according to the logs 
> checkpoints for jobs that did not terminate are deleted. In the shutdown 
> hook, removeAllCheckpoints is called and removes checkpoints that should 
> still be kept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to