[jira] [Commented] (FLINK-17487) Do not delete old checkpoints when stop the job.
[ https://issues.apache.org/jira/browse/FLINK-17487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336207#comment-17336207 ] Flink Jira Bot commented on FLINK-17487: This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Do not delete old checkpoints when stop the job. > > > Key: FLINK-17487 > URL: https://issues.apache.org/jira/browse/FLINK-17487 > Project: Flink > Issue Type: Improvement > Components: Client / Job Submission, Runtime / Checkpointing >Reporter: nobleyd >Priority: Major > Labels: stale-major > > When stop flink job using 'flink stop jobId', the checkpoints data is > deleted. > When the stop action is not succeed or failed because of some unknown errors, > sometimes the job resumes using the latest checkpoint, while sometimes it > just fails, and the checkpoints data is gone. > You may say why I need these checkpoints since I stop the job and a savepoint > will be generated. For example, my job uses a kafka source, while the kafka > missed some data, and I want to stop the job and resume it using an old > checkpoint. Anyway, I mean sometimes the action stop is failed and the > checkpoint data is also deleted, which is not good. > This feature is different from the case 'flink cancel jobId' or 'flink > savepoint jobId', which won't delete the checkpoint data. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17487) Do not delete old checkpoints when stop the job.
[ https://issues.apache.org/jira/browse/FLINK-17487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17327858#comment-17327858 ] Flink Jira Bot commented on FLINK-17487: This major issue is unassigned and itself and all of its Sub-Tasks have not been updated for 30 days. So, it has been labeled "stale-major". If this ticket is indeed "major", please either assign yourself or give an update. Afterwards, please remove the label. In 7 days the issue will be deprioritized. > Do not delete old checkpoints when stop the job. > > > Key: FLINK-17487 > URL: https://issues.apache.org/jira/browse/FLINK-17487 > Project: Flink > Issue Type: Improvement > Components: Client / Job Submission, Runtime / Checkpointing >Reporter: nobleyd >Priority: Major > Labels: stale-major > > When stop flink job using 'flink stop jobId', the checkpoints data is > deleted. > When the stop action is not succeed or failed because of some unknown errors, > sometimes the job resumes using the latest checkpoint, while sometimes it > just fails, and the checkpoints data is gone. > You may say why I need these checkpoints since I stop the job and a savepoint > will be generated. For example, my job uses a kafka source, while the kafka > missed some data, and I want to stop the job and resume it using an old > checkpoint. Anyway, I mean sometimes the action stop is failed and the > checkpoint data is also deleted, which is not good. > This feature is different from the case 'flink cancel jobId' or 'flink > savepoint jobId', which won't delete the checkpoint data. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17487) Do not delete old checkpoints when stop the job.
[ https://issues.apache.org/jira/browse/FLINK-17487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17109386#comment-17109386 ] Congxian Qiu(klion26) commented on FLINK-17487: --- [~nobleyd] Do you mean the stop command did not succeed and the previous checkpoint was deleted? It seems weird to me. could you please share more logs with us(jm and tm log, it's better to enable debug log). > Do not delete old checkpoints when stop the job. > > > Key: FLINK-17487 > URL: https://issues.apache.org/jira/browse/FLINK-17487 > Project: Flink > Issue Type: Improvement > Components: Client / Job Submission, Runtime / Checkpointing >Reporter: nobleyd >Priority: Major > > When stop flink job using 'flink stop jobId', the checkpoints data is > deleted. > When the stop action is not succeed or failed because of some unknown errors, > sometimes the job resumes using the latest checkpoint, while sometimes it > just fails, and the checkpoints data is gone. > You may say why I need these checkpoints since I stop the job and a savepoint > will be generated. For example, my job uses a kafka source, while the kafka > missed some data, and I want to stop the job and resume it using an old > checkpoint. Anyway, I mean sometimes the action stop is failed and the > checkpoint data is also deleted, which is not good. > This feature is different from the case 'flink cancel jobId' or 'flink > savepoint jobId', which won't delete the checkpoint data. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17487) Do not delete old checkpoints when stop the job.
[ https://issues.apache.org/jira/browse/FLINK-17487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107341#comment-17107341 ] nobleyd commented on FLINK-17487: - [~rmetzger] No, if I cancel the job, the checkpoints will be retained as the document said. But, if I 'stop' the job and generate a savepoint, then the checkpoint is cleared. While sometimes, the savepoint is not generated successfully, and the job failed due to the 'stop' command, then I can not find any more checkpoints. Besides, sometimes I stop the job and generate a savepoint, but what I want to do is to restart the job with an early checkpoint, but there won't be any early checkpoints. > Do not delete old checkpoints when stop the job. > > > Key: FLINK-17487 > URL: https://issues.apache.org/jira/browse/FLINK-17487 > Project: Flink > Issue Type: Improvement > Components: Client / Job Submission, Runtime / Checkpointing >Reporter: nobleyd >Priority: Major > > When stop flink job using 'flink stop jobId', the checkpoints data is > deleted. > When the stop action is not succeed or failed because of some unknown errors, > sometimes the job resumes using the latest checkpoint, while sometimes it > just fails, and the checkpoints data is gone. > You may say why I need these checkpoints since I stop the job and a savepoint > will be generated. For example, my job uses a kafka source, while the kafka > missed some data, and I want to stop the job and resume it using an old > checkpoint. Anyway, I mean sometimes the action stop is failed and the > checkpoint data is also deleted, which is not good. > This feature is different from the case 'flink cancel jobId' or 'flink > savepoint jobId', which won't delete the checkpoint data. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17487) Do not delete old checkpoints when stop the job.
[ https://issues.apache.org/jira/browse/FLINK-17487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17104553#comment-17104553 ] Robert Metzger commented on FLINK-17487: Are retained checkpoints what you are looking for? https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/checkpoints.html#retained-checkpoints > Do not delete old checkpoints when stop the job. > > > Key: FLINK-17487 > URL: https://issues.apache.org/jira/browse/FLINK-17487 > Project: Flink > Issue Type: Improvement > Components: Client / Job Submission, Runtime / Checkpointing >Reporter: nobleyd >Priority: Major > > When stop flink job using 'flink stop jobId', the checkpoints data is > deleted. > When the stop action is not succeed or failed because of some unknown errors, > sometimes the job resumes using the latest checkpoint, while sometimes it > just fails, and the checkpoints data is gone. > You may say why I need these checkpoints since I stop the job and a savepoint > will be generated. For example, my job uses a kafka source, while the kafka > missed some data, and I want to stop the job and resume it using an old > checkpoint. Anyway, I mean sometimes the action stop is failed and the > checkpoint data is also deleted, which is not good. > This feature is different from the case 'flink cancel jobId' or 'flink > savepoint jobId', which won't delete the checkpoint data. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17487) Do not delete old checkpoints when stop the job.
[ https://issues.apache.org/jira/browse/FLINK-17487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103613#comment-17103613 ] nobleyd commented on FLINK-17487: - I think the only two case that the checkpoints should be deleted is in below: * the number of checkpoints reached the max number configured, so the earliest checkpoint should be deleted. * delete it manually. Or, I do not think it is reasonable to delete checkpoint when 'stop' while retain it when 'cancel'. I think 'cancel' means we do not need the job anyway, while 'stop' only means we want to stop it to continue sometimes after or just stop it for some errors analysis. > Do not delete old checkpoints when stop the job. > > > Key: FLINK-17487 > URL: https://issues.apache.org/jira/browse/FLINK-17487 > Project: Flink > Issue Type: Improvement > Components: Client / Job Submission, Runtime / Checkpointing >Reporter: nobleyd >Priority: Major > > When stop flink job using 'flink stop jobId', the checkpoints data is > deleted. > When the stop action is not succeed or failed because of some unknown errors, > sometimes the job resumes using the latest checkpoint, while sometimes it > just fails, and the checkpoints data is gone. > You may say why I need these checkpoints since I stop the job and a savepoint > will be generated. For example, my job uses a kafka source, while the kafka > missed some data, and I want to stop the job and resume it using an old > checkpoint. Anyway, I mean sometimes the action stop is failed and the > checkpoint data is also deleted, which is not good. > This feature is different from the case 'flink cancel jobId' or 'flink > savepoint jobId', which won't delete the checkpoint data. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17487) Do not delete old checkpoints when stop the job.
[ https://issues.apache.org/jira/browse/FLINK-17487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103612#comment-17103612 ] nobleyd commented on FLINK-17487: - Someone may say if I need to restart the job using an older checkpoint, I should cancel it but not stop it(which will generate a savepoint and delete all checkpoints). While, the reason that I use 'stop' but not 'cancel' is for secure(for some unexpected errors maybe). > Do not delete old checkpoints when stop the job. > > > Key: FLINK-17487 > URL: https://issues.apache.org/jira/browse/FLINK-17487 > Project: Flink > Issue Type: Improvement > Components: Client / Job Submission, Runtime / Checkpointing >Reporter: nobleyd >Priority: Major > > When stop flink job using 'flink stop jobId', the checkpoints data is > deleted. > When the stop action is not succeed or failed because of some unknown errors, > sometimes the job resumes using the latest checkpoint, while sometimes it > just fails, and the checkpoints data is gone. > You may say why I need these checkpoints since I stop the job and a savepoint > will be generated. For example, my job uses a kafka source, while the kafka > missed some data, and I want to stop the job and resume it using an old > checkpoint. Anyway, I mean sometimes the action stop is failed and the > checkpoint data is also deleted, which is not good. > This feature is different from the case 'flink cancel jobId' or 'flink > savepoint jobId', which won't delete the checkpoint data. > -- This message was sent by Atlassian Jira (v8.3.4#803005)