[ 
https://issues.apache.org/jira/browse/FLINK-21999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17315371#comment-17315371
 ] 

Till Rohrmann commented on FLINK-21999:
---------------------------------------

I think the current behaviour is the following: 

Only don't start the {{CheckpointCoordinator}} if the {{CheckpointConfig == 
null}}. This is currently only the case if the user uses the 
{{ExecutionEnvironment}} (e.g. {{DataSet}} API).

When using the {{StreamExecutionEnvironment}}, then the {{CheckpointConfig}} is 
always set. This will always instantiate the {{CheckpointCoordinator}} and also 
configure the state backends. Whether the {{CheckpointCoordinator}} triggers 
periodic checkpoints or not is controlled by the interval. That is also what 
happens if you call {{CheckpointConfig.disableCheckpointing}}. This does not 
disable the instantiation of the {{CheckpointCoordinator}} but only the 
periodic checkpoint triggering.

With the introduction of the BATCH execution mode for streaming jobs the whole 
separation has been blurred further. Here we want to instantiate the state 
backends but don't want to enable periodic checkpointing. That's why the 
{{StreamGraphGenerator}} disables the periodic checkpointing.

I think the underlying problem is that the state backend configuration is 
coupled with the {{CheckpointCoordinator}} configuration which should no longer 
be the case due to the BATCH execution mode. But also keep in mind that the 
{{CheckpointCoordinator}} is needed for executing savepoint requests from the 
user. Hence, the checkpoint interval alone is not sufficient to decide whether 
to instantiate the {{CheckpointCoordinator}} or not.

Given that this seems to be a bit more involved, I would suggest to not do this 
change at the moment or at least start with a proper proposal how to configure 
what and when to enable what component.

> The logic about whether Checkpoint is enabled.
> ----------------------------------------------
>
>                 Key: FLINK-21999
>                 URL: https://issues.apache.org/jira/browse/FLINK-21999
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>            Reporter: ZhangWei
>            Assignee: ZhangWei
>            Priority: Major
>              Labels: pull-request-available
>
> org.apache.flink.runtime.executiongraph.DefaultExecutionGraphBuilder#isCheckpointingEnabled
>  assumes checkpoint enabled when JobCheckpointingSettings is not null. While 
> this is not enough, we must also guarantee the checkpoint interval is between 
> [MINIMAL_CHECKPOINT_TIME, Long.MaxValue). That is like the 
> JobGraph#isCheckpointingEnabled does.
>    In current implement, when we do not set checkpoint interval, leaving it 
> the default value -1, the interval  will be changed to Long.MaxValue. Thus 
> DefaultExecutionGraphBuilder#isCheckpointingEnabled will return true. That is 
> not correct.
> in addition, there are different classes assume checkpoint enabled with 
> different interval range.
> 1. CheckpointConfig -> (0,Long.MaxValue*]*.
> 2. JobGraph -> (0,Long.MaxValue)
> This is not consistent. And the correct range is [MINIMAL_CHECKPOINT_TIME, 
> Long.MaxValue).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to