[jira] [Commented] (FLINK-17487) Do not delete old checkpoints when stop the job.

2021-04-29 Thread Flink Jira Bot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17336207#comment-17336207
 ] 

Flink Jira Bot commented on FLINK-17487:


This issue was labeled "stale-major" 7 ago and has not received any updates so 
it is being deprioritized. If this ticket is actually Major, please raise the 
priority and ask a committer to assign you the issue or revive the public 
discussion.


> Do not delete old checkpoints when stop the job.
> 
>
> Key: FLINK-17487
> URL: https://issues.apache.org/jira/browse/FLINK-17487
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission, Runtime / Checkpointing
>Reporter: nobleyd
>Priority: Major
>  Labels: stale-major
>
> When stop flink job using 'flink stop jobId', the checkpoints data is 
> deleted. 
> When the stop action is not succeed or failed because of some unknown errors, 
> sometimes the job resumes using the latest checkpoint, while sometimes it 
> just fails, and the checkpoints data is gone.
> You may say why I need these checkpoints since I stop the job and a savepoint 
> will be generated. For example, my job uses a kafka source, while the kafka 
> missed some data, and I want to stop the job and resume it using an old 
> checkpoint. Anyway, I mean sometimes the action stop is failed and the 
> checkpoint data is also deleted, which is not good. 
> This feature is different from the case 'flink cancel jobId' or 'flink 
> savepoint jobId', which won't delete the checkpoint data.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17487) Do not delete old checkpoints when stop the job.

2021-04-22 Thread Flink Jira Bot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17327858#comment-17327858
 ] 

Flink Jira Bot commented on FLINK-17487:


This major issue is unassigned and itself and all of its Sub-Tasks have not 
been updated for 30 days. So, it has been labeled "stale-major". If this ticket 
is indeed "major", please either assign yourself or give an update. Afterwards, 
please remove the label. In 7 days the issue will be deprioritized.

> Do not delete old checkpoints when stop the job.
> 
>
> Key: FLINK-17487
> URL: https://issues.apache.org/jira/browse/FLINK-17487
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission, Runtime / Checkpointing
>Reporter: nobleyd
>Priority: Major
>  Labels: stale-major
>
> When stop flink job using 'flink stop jobId', the checkpoints data is 
> deleted. 
> When the stop action is not succeed or failed because of some unknown errors, 
> sometimes the job resumes using the latest checkpoint, while sometimes it 
> just fails, and the checkpoints data is gone.
> You may say why I need these checkpoints since I stop the job and a savepoint 
> will be generated. For example, my job uses a kafka source, while the kafka 
> missed some data, and I want to stop the job and resume it using an old 
> checkpoint. Anyway, I mean sometimes the action stop is failed and the 
> checkpoint data is also deleted, which is not good. 
> This feature is different from the case 'flink cancel jobId' or 'flink 
> savepoint jobId', which won't delete the checkpoint data.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17487) Do not delete old checkpoints when stop the job.

2020-05-17 Thread Congxian Qiu(klion26) (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17109386#comment-17109386
 ] 

Congxian Qiu(klion26) commented on FLINK-17487:
---

[~nobleyd] Do you mean the stop command did not succeed and the previous 
checkpoint was deleted? It seems weird to me. could you please share more logs 
with us(jm and tm log, it's better to enable debug log).

> Do not delete old checkpoints when stop the job.
> 
>
> Key: FLINK-17487
> URL: https://issues.apache.org/jira/browse/FLINK-17487
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission, Runtime / Checkpointing
>Reporter: nobleyd
>Priority: Major
>
> When stop flink job using 'flink stop jobId', the checkpoints data is 
> deleted. 
> When the stop action is not succeed or failed because of some unknown errors, 
> sometimes the job resumes using the latest checkpoint, while sometimes it 
> just fails, and the checkpoints data is gone.
> You may say why I need these checkpoints since I stop the job and a savepoint 
> will be generated. For example, my job uses a kafka source, while the kafka 
> missed some data, and I want to stop the job and resume it using an old 
> checkpoint. Anyway, I mean sometimes the action stop is failed and the 
> checkpoint data is also deleted, which is not good. 
> This feature is different from the case 'flink cancel jobId' or 'flink 
> savepoint jobId', which won't delete the checkpoint data.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17487) Do not delete old checkpoints when stop the job.

2020-05-14 Thread nobleyd (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107341#comment-17107341
 ] 

nobleyd commented on FLINK-17487:
-

[~rmetzger] No, if I cancel the job, the checkpoints will be retained as the 
document said. But, if I 'stop' the job and generate a savepoint, then the 
checkpoint is cleared. While sometimes, the savepoint is not generated 
successfully, and the job failed due to the 'stop' command, then I can not find 
any more checkpoints. Besides, sometimes I stop the job and generate a 
savepoint, but what I want to do is to restart the job with an early 
checkpoint, but there won't be any early checkpoints.

> Do not delete old checkpoints when stop the job.
> 
>
> Key: FLINK-17487
> URL: https://issues.apache.org/jira/browse/FLINK-17487
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission, Runtime / Checkpointing
>Reporter: nobleyd
>Priority: Major
>
> When stop flink job using 'flink stop jobId', the checkpoints data is 
> deleted. 
> When the stop action is not succeed or failed because of some unknown errors, 
> sometimes the job resumes using the latest checkpoint, while sometimes it 
> just fails, and the checkpoints data is gone.
> You may say why I need these checkpoints since I stop the job and a savepoint 
> will be generated. For example, my job uses a kafka source, while the kafka 
> missed some data, and I want to stop the job and resume it using an old 
> checkpoint. Anyway, I mean sometimes the action stop is failed and the 
> checkpoint data is also deleted, which is not good. 
> This feature is different from the case 'flink cancel jobId' or 'flink 
> savepoint jobId', which won't delete the checkpoint data.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17487) Do not delete old checkpoints when stop the job.

2020-05-11 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17104553#comment-17104553
 ] 

Robert Metzger commented on FLINK-17487:


Are retained checkpoints what you are looking for? 
https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/checkpoints.html#retained-checkpoints

> Do not delete old checkpoints when stop the job.
> 
>
> Key: FLINK-17487
> URL: https://issues.apache.org/jira/browse/FLINK-17487
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission, Runtime / Checkpointing
>Reporter: nobleyd
>Priority: Major
>
> When stop flink job using 'flink stop jobId', the checkpoints data is 
> deleted. 
> When the stop action is not succeed or failed because of some unknown errors, 
> sometimes the job resumes using the latest checkpoint, while sometimes it 
> just fails, and the checkpoints data is gone.
> You may say why I need these checkpoints since I stop the job and a savepoint 
> will be generated. For example, my job uses a kafka source, while the kafka 
> missed some data, and I want to stop the job and resume it using an old 
> checkpoint. Anyway, I mean sometimes the action stop is failed and the 
> checkpoint data is also deleted, which is not good. 
> This feature is different from the case 'flink cancel jobId' or 'flink 
> savepoint jobId', which won't delete the checkpoint data.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17487) Do not delete old checkpoints when stop the job.

2020-05-09 Thread nobleyd (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103613#comment-17103613
 ] 

nobleyd commented on FLINK-17487:
-

I think the only two case that the checkpoints should be deleted is in below:
 * the number of checkpoints reached the max number configured, so the earliest 
checkpoint should be deleted.
 * delete it manually.

Or, I do not think it is reasonable to delete checkpoint when 'stop' while 
retain it when 'cancel'. I think 'cancel' means we do not need the job anyway, 
while 'stop' only means we want to stop it to continue sometimes after or just 
stop it for some errors analysis.

> Do not delete old checkpoints when stop the job.
> 
>
> Key: FLINK-17487
> URL: https://issues.apache.org/jira/browse/FLINK-17487
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission, Runtime / Checkpointing
>Reporter: nobleyd
>Priority: Major
>
> When stop flink job using 'flink stop jobId', the checkpoints data is 
> deleted. 
> When the stop action is not succeed or failed because of some unknown errors, 
> sometimes the job resumes using the latest checkpoint, while sometimes it 
> just fails, and the checkpoints data is gone.
> You may say why I need these checkpoints since I stop the job and a savepoint 
> will be generated. For example, my job uses a kafka source, while the kafka 
> missed some data, and I want to stop the job and resume it using an old 
> checkpoint. Anyway, I mean sometimes the action stop is failed and the 
> checkpoint data is also deleted, which is not good. 
> This feature is different from the case 'flink cancel jobId' or 'flink 
> savepoint jobId', which won't delete the checkpoint data.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17487) Do not delete old checkpoints when stop the job.

2020-05-09 Thread nobleyd (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103612#comment-17103612
 ] 

nobleyd commented on FLINK-17487:
-

Someone may say if I need to restart the job using an older checkpoint, I 
should cancel it but not stop it(which will generate a savepoint and delete all 
checkpoints). While, the reason that I use 'stop' but not 'cancel' is for 
secure(for some unexpected errors maybe).

> Do not delete old checkpoints when stop the job.
> 
>
> Key: FLINK-17487
> URL: https://issues.apache.org/jira/browse/FLINK-17487
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission, Runtime / Checkpointing
>Reporter: nobleyd
>Priority: Major
>
> When stop flink job using 'flink stop jobId', the checkpoints data is 
> deleted. 
> When the stop action is not succeed or failed because of some unknown errors, 
> sometimes the job resumes using the latest checkpoint, while sometimes it 
> just fails, and the checkpoints data is gone.
> You may say why I need these checkpoints since I stop the job and a savepoint 
> will be generated. For example, my job uses a kafka source, while the kafka 
> missed some data, and I want to stop the job and resume it using an old 
> checkpoint. Anyway, I mean sometimes the action stop is failed and the 
> checkpoint data is also deleted, which is not good. 
> This feature is different from the case 'flink cancel jobId' or 'flink 
> savepoint jobId', which won't delete the checkpoint data.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)