[jira] [Commented] (FLINK-12619) Support TERMINATE/SUSPEND Job with Checkpoint

Congxian Qiu(klion26) (JIRA) Tue, 28 May 2019 03:37:48 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-12619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849589#comment-16849589
 ]


Congxian Qiu(klion26) commented on FLINK-12619:
-----------------------------------------------

[~aljoscha] thanks for you reply.

For this issue, I want to the following steps (most of them will reuse the code 
of FLINK-11458)
 * add a requried functions and RPCs in JM and TM for this issue and
 ** {{CheckpointCoordinator#triggerSynchronousCheckpoint}} (aligned with 
{{triggerSynchronousSavepoint}})
 ** {{SchedulerNG#stopWithCheckpoint}} (aligned with {{stopWithSavepoint}})
 ** a new {{CheckpointType}} named with {{SYNC_CHECKPOINT}}(aiigned with 
{{SYNC_SAVEPOINT}}
 ** {{JobMaster#stopWithCheckpoint}} (aligned with {{stopWithSavepoint}})
 ** Aligned to allow  sync checkpoint (current only support sync savepoint)
 ** Some needed test for this
 * export this to CLI
 ** will add a option receive no paramer(will reuse the preconfigured 
checkpoint directory), mostly like {{CliFrontedParser.STOP_WITH_SAVEPOINT}}
 * add rest api for this
 ** will add endpoint, restful gateway, trigger handler, request boby and so 
on(like FLINK-11458)

What do you think?

> Support TERMINATE/SUSPEND Job with Checkpoint
> ---------------------------------------------
>
>                 Key: FLINK-12619
>                 URL: https://issues.apache.org/jira/browse/FLINK-12619
>             Project: Flink
>          Issue Type: New Feature
>          Components: Runtime / State Backends
>            Reporter: Congxian Qiu(klion26)
>            Assignee: Congxian Qiu(klion26)
>            Priority: Major
>
> Inspired by the idea of FLINK-11458, we propose to support terminate/suspend 
> a job with checkpoint. This improvement cooperates with incremental and 
> external checkpoint features, that if checkpoint is retained and this feature 
> is configured, we will trigger a checkpoint before the job stops. It could 
> accelarate job recovery a lot since:
> 1. No source rewinding required any more.
> 2. It's much faster than taking a savepoint since incremental checkpoint is 
> enabled.
> Please note that conceptually savepoints is different from checkpoint in a 
> similar way that backups are different from recovery logs in traditional 
> database systems. So we suggest using this feature only for job recovery, 
> while stick with FLINK-11458 for the 
> upgrading/cross-cluster-job-migration/state-backend-switch cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-12619) Support TERMINATE/SUSPEND Job with Checkpoint

Reply via email to