[jira] [Commented] (FLINK-21999) The logic about whether Checkpoint is enabled.

2021-04-19 Thread Flink Jira Bot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325361#comment-17325361
 ] 

Flink Jira Bot commented on FLINK-21999:


This issue is assigned but has not received an update in 7 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> The logic about whether Checkpoint is enabled.
> --
>
> Key: FLINK-21999
> URL: https://issues.apache.org/jira/browse/FLINK-21999
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: ZhangWei
>Assignee: ZhangWei
>Priority: Major
>  Labels: pull-request-available, stale-assigned
>
> org.apache.flink.runtime.executiongraph.DefaultExecutionGraphBuilder#isCheckpointingEnabled
>  assumes checkpoint enabled when JobCheckpointingSettings is not null. While 
> this is not enough, we must also guarantee the checkpoint interval is between 
> [MINIMAL_CHECKPOINT_TIME, Long.MaxValue). That is like the 
> JobGraph#isCheckpointingEnabled does.
>In current implement, when we do not set checkpoint interval, leaving it 
> the default value -1, the interval  will be changed to Long.MaxValue. Thus 
> DefaultExecutionGraphBuilder#isCheckpointingEnabled will return true. That is 
> not correct.
> in addition, there are different classes assume checkpoint enabled with 
> different interval range.
> 1. CheckpointConfig -> (0,Long.MaxValue*]*.
> 2. JobGraph -> (0,Long.MaxValue)
> This is not consistent. And the correct range is [MINIMAL_CHECKPOINT_TIME, 
> Long.MaxValue).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21999) The logic about whether Checkpoint is enabled.

2021-04-06 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17315375#comment-17315375
 ] 

Till Rohrmann commented on FLINK-21999:
---

Maybe the easiest fix would be to make the naming of some of the methods a bit 
more explicit wrt what they mean. E.g. 
{{DefaultExecutionGraphBuilder.isCheckpointingEnabled}} probably should rather 
be called {{shouldInstantiateCheckpointCoordinator()}}.

> The logic about whether Checkpoint is enabled.
> --
>
> Key: FLINK-21999
> URL: https://issues.apache.org/jira/browse/FLINK-21999
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: ZhangWei
>Assignee: ZhangWei
>Priority: Major
>  Labels: pull-request-available
>
> org.apache.flink.runtime.executiongraph.DefaultExecutionGraphBuilder#isCheckpointingEnabled
>  assumes checkpoint enabled when JobCheckpointingSettings is not null. While 
> this is not enough, we must also guarantee the checkpoint interval is between 
> [MINIMAL_CHECKPOINT_TIME, Long.MaxValue). That is like the 
> JobGraph#isCheckpointingEnabled does.
>In current implement, when we do not set checkpoint interval, leaving it 
> the default value -1, the interval  will be changed to Long.MaxValue. Thus 
> DefaultExecutionGraphBuilder#isCheckpointingEnabled will return true. That is 
> not correct.
> in addition, there are different classes assume checkpoint enabled with 
> different interval range.
> 1. CheckpointConfig -> (0,Long.MaxValue*]*.
> 2. JobGraph -> (0,Long.MaxValue)
> This is not consistent. And the correct range is [MINIMAL_CHECKPOINT_TIME, 
> Long.MaxValue).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21999) The logic about whether Checkpoint is enabled.

2021-04-06 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17315371#comment-17315371
 ] 

Till Rohrmann commented on FLINK-21999:
---

I think the current behaviour is the following: 

Only don't start the {{CheckpointCoordinator}} if the {{CheckpointConfig == 
null}}. This is currently only the case if the user uses the 
{{ExecutionEnvironment}} (e.g. {{DataSet}} API).

When using the {{StreamExecutionEnvironment}}, then the {{CheckpointConfig}} is 
always set. This will always instantiate the {{CheckpointCoordinator}} and also 
configure the state backends. Whether the {{CheckpointCoordinator}} triggers 
periodic checkpoints or not is controlled by the interval. That is also what 
happens if you call {{CheckpointConfig.disableCheckpointing}}. This does not 
disable the instantiation of the {{CheckpointCoordinator}} but only the 
periodic checkpoint triggering.

With the introduction of the BATCH execution mode for streaming jobs the whole 
separation has been blurred further. Here we want to instantiate the state 
backends but don't want to enable periodic checkpointing. That's why the 
{{StreamGraphGenerator}} disables the periodic checkpointing.

I think the underlying problem is that the state backend configuration is 
coupled with the {{CheckpointCoordinator}} configuration which should no longer 
be the case due to the BATCH execution mode. But also keep in mind that the 
{{CheckpointCoordinator}} is needed for executing savepoint requests from the 
user. Hence, the checkpoint interval alone is not sufficient to decide whether 
to instantiate the {{CheckpointCoordinator}} or not.

Given that this seems to be a bit more involved, I would suggest to not do this 
change at the moment or at least start with a proper proposal how to configure 
what and when to enable what component.

> The logic about whether Checkpoint is enabled.
> --
>
> Key: FLINK-21999
> URL: https://issues.apache.org/jira/browse/FLINK-21999
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: ZhangWei
>Assignee: ZhangWei
>Priority: Major
>  Labels: pull-request-available
>
> org.apache.flink.runtime.executiongraph.DefaultExecutionGraphBuilder#isCheckpointingEnabled
>  assumes checkpoint enabled when JobCheckpointingSettings is not null. While 
> this is not enough, we must also guarantee the checkpoint interval is between 
> [MINIMAL_CHECKPOINT_TIME, Long.MaxValue). That is like the 
> JobGraph#isCheckpointingEnabled does.
>In current implement, when we do not set checkpoint interval, leaving it 
> the default value -1, the interval  will be changed to Long.MaxValue. Thus 
> DefaultExecutionGraphBuilder#isCheckpointingEnabled will return true. That is 
> not correct.
> in addition, there are different classes assume checkpoint enabled with 
> different interval range.
> 1. CheckpointConfig -> (0,Long.MaxValue*]*.
> 2. JobGraph -> (0,Long.MaxValue)
> This is not consistent. And the correct range is [MINIMAL_CHECKPOINT_TIME, 
> Long.MaxValue).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21999) The logic about whether Checkpoint is enabled.

2021-04-03 Thread ZhangWei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17314236#comment-17314236
 ] 

ZhangWei commented on FLINK-21999:
--

[~trohrmann]
In current implementation,  when checkpoint interval is Long.MaxValue.
DefaultExecutionGraphBuilder will assume checkpoint is enabled also and in the 
executionGraph returned by DefaultExecutionGraphBuilder#buildGraph, it will set 
the related checkpoint configuration. This method will Instantiate a 
CheckpointCoordinator. 

So when I set the checkpoint enabled condition to also require checkpoint 
interval not equal to Long.MaxValue, the CheckpointCoordinator will not be 
Instantiated, then OperatorCoordinatorSchedulerTest will fail with 
CheckpointCoordinator is null.

In short, current implementation will always Instantiate the 
CheckpointCoordinator whenever the interval equals is Long.MaxValue or not. I 
am not sure what is the expected behavior.

In addition, I found CheckpointCoordinator#isPeriodicCheckpointingConfigured, 
what does period means here? what is the difference between period and not 
period then? Is period means regular checkpoint and not period means disable 
checkpoint then?

The failed CI tests result can be found below:
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=16033&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=05b74a19-4ee4-5036-c46f-ada307df6cf0

> The logic about whether Checkpoint is enabled.
> --
>
> Key: FLINK-21999
> URL: https://issues.apache.org/jira/browse/FLINK-21999
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: ZhangWei
>Assignee: ZhangWei
>Priority: Major
>  Labels: pull-request-available
>
> org.apache.flink.runtime.executiongraph.DefaultExecutionGraphBuilder#isCheckpointingEnabled
>  assumes checkpoint enabled when JobCheckpointingSettings is not null. While 
> this is not enough, we must also guarantee the checkpoint interval is between 
> [MINIMAL_CHECKPOINT_TIME, Long.MaxValue). That is like the 
> JobGraph#isCheckpointingEnabled does.
>In current implement, when we do not set checkpoint interval, leaving it 
> the default value -1, the interval  will be changed to Long.MaxValue. Thus 
> DefaultExecutionGraphBuilder#isCheckpointingEnabled will return true. That is 
> not correct.
> in addition, there are different classes assume checkpoint enabled with 
> different interval range.
> 1. CheckpointConfig -> (0,Long.MaxValue*]*.
> 2. JobGraph -> (0,Long.MaxValue)
> This is not consistent. And the correct range is [MINIMAL_CHECKPOINT_TIME, 
> Long.MaxValue).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21999) The logic about whether Checkpoint is enabled.

2021-03-31 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17312192#comment-17312192
 ] 

Till Rohrmann commented on FLINK-21999:
---

Great, I've assigned you to this ticket now.

> The logic about whether Checkpoint is enabled.
> --
>
> Key: FLINK-21999
> URL: https://issues.apache.org/jira/browse/FLINK-21999
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: ZhangWei
>Assignee: ZhangWei
>Priority: Major
>
> org.apache.flink.runtime.executiongraph.DefaultExecutionGraphBuilder#isCheckpointingEnabled
>  assumes checkpoint enabled when JobCheckpointingSettings is not null. While 
> this is not enough, we must also guarantee the checkpoint interval is between 
> [MINIMAL_CHECKPOINT_TIME, Long.MaxValue). That is like the 
> JobGraph#isCheckpointingEnabled does.
>In current implement, when we do not set checkpoint interval, leaving it 
> the default value -1, the interval  will be changed to Long.MaxValue. Thus 
> DefaultExecutionGraphBuilder#isCheckpointingEnabled will return true. That is 
> not correct.
> in addition, there are different classes assume checkpoint enabled with 
> different interval range.
> 1. CheckpointConfig -> (0,Long.MaxValue*]*.
> 2. JobGraph -> (0,Long.MaxValue)
> This is not consistent. And the correct range is [MINIMAL_CHECKPOINT_TIME, 
> Long.MaxValue).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21999) The logic about whether Checkpoint is enabled.

2021-03-30 Thread ZhangWei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311443#comment-17311443
 ] 

ZhangWei commented on FLINK-21999:
--

[~trohrmann] glad to take it!


> The logic about whether Checkpoint is enabled.
> --
>
> Key: FLINK-21999
> URL: https://issues.apache.org/jira/browse/FLINK-21999
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: ZhangWei
>Priority: Major
>
> org.apache.flink.runtime.executiongraph.DefaultExecutionGraphBuilder#isCheckpointingEnabled
>  assumes checkpoint enabled when JobCheckpointingSettings is not null. While 
> this is not enough, we must also guarantee the checkpoint interval is between 
> [MINIMAL_CHECKPOINT_TIME, Long.MaxValue). That is like the 
> JobGraph#isCheckpointingEnabled does.
>In current implement, when we do not set checkpoint interval, leaving it 
> the default value -1, the interval  will be changed to Long.MaxValue. Thus 
> DefaultExecutionGraphBuilder#isCheckpointingEnabled will return true. That is 
> not correct.
> in addition, there are different classes assume checkpoint enabled with 
> different interval range.
> 1. CheckpointConfig -> (0,Long.MaxValue*]*.
> 2. JobGraph -> (0,Long.MaxValue)
> This is not consistent. And the correct range is [MINIMAL_CHECKPOINT_TIME, 
> Long.MaxValue).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21999) The logic about whether Checkpoint is enabled.

2021-03-30 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311327#comment-17311327
 ] 

Till Rohrmann commented on FLINK-21999:
---

Thanks for reporting this issue [~Amon]. I agree that this behaviour is not 
consistent and should be fixed. Do you wanna try to fix the problem?

> The logic about whether Checkpoint is enabled.
> --
>
> Key: FLINK-21999
> URL: https://issues.apache.org/jira/browse/FLINK-21999
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: ZhangWei
>Priority: Major
>
> org.apache.flink.runtime.executiongraph.DefaultExecutionGraphBuilder#isCheckpointingEnabled
>  assumes checkpoint enabled when JobCheckpointingSettings is not null. While 
> this is not enough, we must also guarantee the checkpoint interval is between 
> [MINIMAL_CHECKPOINT_TIME, Long.MaxValue). That is like the 
> JobGraph#isCheckpointingEnabled does.
>In current implement, when we do not set checkpoint interval, leaving it 
> the default value -1, the interval  will be changed to Long.MaxValue. Thus 
> DefaultExecutionGraphBuilder#isCheckpointingEnabled will return true. That is 
> not correct.
> in addition, there are different classes assume checkpoint enabled with 
> different interval range.
> 1. CheckpointConfig -> (0,Long.MaxValue*]*.
> 2. JobGraph -> (0,Long.MaxValue)
> This is not consistent. And the correct range is [MINIMAL_CHECKPOINT_TIME, 
> Long.MaxValue).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)