[jira] [Commented] (FLINK-17487) Do not delete old checkpoints when stop the job.

nobleyd (Jira) Sat, 09 May 2020 20:12:18 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-17487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17103613#comment-17103613
 ]


nobleyd commented on FLINK-17487:
---------------------------------

I think the only two case that the checkpoints should be deleted is in below:
 * the number of checkpoints reached the max number configured, so the earliest 
checkpoint should be deleted.
 * delete it manually.

Or, I do not think it is reasonable to delete checkpoint when 'stop' while 
retain it when 'cancel'. I think 'cancel' means we do not need the job anyway, 
while 'stop' only means we want to stop it to continue sometimes after or just 
stop it for some errors analysis.

> Do not delete old checkpoints when stop the job.
> ------------------------------------------------
>
>                 Key: FLINK-17487
>                 URL: https://issues.apache.org/jira/browse/FLINK-17487
>             Project: Flink
>          Issue Type: Improvement
>          Components: Client / Job Submission, Runtime / Checkpointing
>            Reporter: nobleyd
>            Priority: Major
>
> When stop flink job using 'flink stop jobId', the checkpoints data is 
> deleted. 
> When the stop action is not succeed or failed because of some unknown errors, 
> sometimes the job resumes using the latest checkpoint, while sometimes it 
> just fails, and the checkpoints data is gone.
> You may say why I need these checkpoints since I stop the job and a savepoint 
> will be generated. For example, my job uses a kafka source, while the kafka 
> missed some data, and I want to stop the job and resume it using an old 
> checkpoint. Anyway, I mean sometimes the action stop is failed and the 
> checkpoint data is also deleted, which is not good. 
> This feature is different from the case 'flink cancel jobId' or 'flink 
> savepoint jobId', which won't delete the checkpoint data.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-17487) Do not delete old checkpoints when stop the job.

Reply via email to