[jira] [Commented] (FLINK-3397) Failed streaming jobs should fall back to the most recent checkpoint/savepoint

ramkrishna.s.vasudevan (JIRA) Sun, 24 Jul 2016 09:41:56 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391103#comment-15391103
 ]


ramkrishna.s.vasudevan commented on FLINK-3397:
-----------------------------------------------

[~uce]
bq.I fear though that these changes require some more consideration about how 
savepoints are stored/accessed. They are currently mostly independent of the 
job from which they were created.
I read thro the code. The CheckPointIdCounter (ZooKeeperCheckpointIDCounter) 
tries to create a counter per job id using the job id path in the zookeeper. So 
which means the savepoint and checkpoints are stored and accessed per job only 
right? If this is wrong, then am missing something. Pls correct me if am wrong.

> Failed streaming jobs should fall back to the most recent checkpoint/savepoint
> ------------------------------------------------------------------------------
>
>                 Key: FLINK-3397
>                 URL: https://issues.apache.org/jira/browse/FLINK-3397
>             Project: Flink
>          Issue Type: Improvement
>          Components: State Backends, Checkpointing, Streaming
>    Affects Versions: 1.0.0
>            Reporter: Gyula Fora
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Minor
>         Attachments: FLINK-3397.pdf
>
>
> The current fallback behaviour in case of a streaming job failure is slightly 
> counterintuitive:
> If a job fails it will fall back to the most recent checkpoint (if any) even 
> if there were more recent savepoint taken. This means that savepoints are not 
> regarded as checkpoints by the system only points from where a job can be 
> manually restarted.
> I suggest to change this so that savepoints are also regarded as checkpoints 
> in case of a failure and they will also be used to automatically restore the 
> streaming job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3397) Failed streaming jobs should fall back to the most recent checkpoint/savepoint

Reply via email to