[jira] [Commented] (FLINK-21999) The logic about whether Checkpoint is enabled.
[ https://issues.apache.org/jira/browse/FLINK-21999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325361#comment-17325361 ] Flink Jira Bot commented on FLINK-21999: This issue is assigned but has not received an update in 7 days so it has been labeled "stale-assigned". If you are still working on the issue, please give an update and remove the label. If you are no longer working on the issue, please unassign so someone else may work on it. In 7 days the issue will be automatically unassigned. > The logic about whether Checkpoint is enabled. > -- > > Key: FLINK-21999 > URL: https://issues.apache.org/jira/browse/FLINK-21999 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Reporter: ZhangWei >Assignee: ZhangWei >Priority: Major > Labels: pull-request-available, stale-assigned > > org.apache.flink.runtime.executiongraph.DefaultExecutionGraphBuilder#isCheckpointingEnabled > assumes checkpoint enabled when JobCheckpointingSettings is not null. While > this is not enough, we must also guarantee the checkpoint interval is between > [MINIMAL_CHECKPOINT_TIME, Long.MaxValue). That is like the > JobGraph#isCheckpointingEnabled does. >In current implement, when we do not set checkpoint interval, leaving it > the default value -1, the interval will be changed to Long.MaxValue. Thus > DefaultExecutionGraphBuilder#isCheckpointingEnabled will return true. That is > not correct. > in addition, there are different classes assume checkpoint enabled with > different interval range. > 1. CheckpointConfig -> (0,Long.MaxValue*]*. > 2. JobGraph -> (0,Long.MaxValue) > This is not consistent. And the correct range is [MINIMAL_CHECKPOINT_TIME, > Long.MaxValue). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21999) The logic about whether Checkpoint is enabled.
[ https://issues.apache.org/jira/browse/FLINK-21999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17315375#comment-17315375 ] Till Rohrmann commented on FLINK-21999: --- Maybe the easiest fix would be to make the naming of some of the methods a bit more explicit wrt what they mean. E.g. {{DefaultExecutionGraphBuilder.isCheckpointingEnabled}} probably should rather be called {{shouldInstantiateCheckpointCoordinator()}}. > The logic about whether Checkpoint is enabled. > -- > > Key: FLINK-21999 > URL: https://issues.apache.org/jira/browse/FLINK-21999 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Reporter: ZhangWei >Assignee: ZhangWei >Priority: Major > Labels: pull-request-available > > org.apache.flink.runtime.executiongraph.DefaultExecutionGraphBuilder#isCheckpointingEnabled > assumes checkpoint enabled when JobCheckpointingSettings is not null. While > this is not enough, we must also guarantee the checkpoint interval is between > [MINIMAL_CHECKPOINT_TIME, Long.MaxValue). That is like the > JobGraph#isCheckpointingEnabled does. >In current implement, when we do not set checkpoint interval, leaving it > the default value -1, the interval will be changed to Long.MaxValue. Thus > DefaultExecutionGraphBuilder#isCheckpointingEnabled will return true. That is > not correct. > in addition, there are different classes assume checkpoint enabled with > different interval range. > 1. CheckpointConfig -> (0,Long.MaxValue*]*. > 2. JobGraph -> (0,Long.MaxValue) > This is not consistent. And the correct range is [MINIMAL_CHECKPOINT_TIME, > Long.MaxValue). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21999) The logic about whether Checkpoint is enabled.
[ https://issues.apache.org/jira/browse/FLINK-21999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17315371#comment-17315371 ] Till Rohrmann commented on FLINK-21999: --- I think the current behaviour is the following: Only don't start the {{CheckpointCoordinator}} if the {{CheckpointConfig == null}}. This is currently only the case if the user uses the {{ExecutionEnvironment}} (e.g. {{DataSet}} API). When using the {{StreamExecutionEnvironment}}, then the {{CheckpointConfig}} is always set. This will always instantiate the {{CheckpointCoordinator}} and also configure the state backends. Whether the {{CheckpointCoordinator}} triggers periodic checkpoints or not is controlled by the interval. That is also what happens if you call {{CheckpointConfig.disableCheckpointing}}. This does not disable the instantiation of the {{CheckpointCoordinator}} but only the periodic checkpoint triggering. With the introduction of the BATCH execution mode for streaming jobs the whole separation has been blurred further. Here we want to instantiate the state backends but don't want to enable periodic checkpointing. That's why the {{StreamGraphGenerator}} disables the periodic checkpointing. I think the underlying problem is that the state backend configuration is coupled with the {{CheckpointCoordinator}} configuration which should no longer be the case due to the BATCH execution mode. But also keep in mind that the {{CheckpointCoordinator}} is needed for executing savepoint requests from the user. Hence, the checkpoint interval alone is not sufficient to decide whether to instantiate the {{CheckpointCoordinator}} or not. Given that this seems to be a bit more involved, I would suggest to not do this change at the moment or at least start with a proper proposal how to configure what and when to enable what component. > The logic about whether Checkpoint is enabled. > -- > > Key: FLINK-21999 > URL: https://issues.apache.org/jira/browse/FLINK-21999 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Reporter: ZhangWei >Assignee: ZhangWei >Priority: Major > Labels: pull-request-available > > org.apache.flink.runtime.executiongraph.DefaultExecutionGraphBuilder#isCheckpointingEnabled > assumes checkpoint enabled when JobCheckpointingSettings is not null. While > this is not enough, we must also guarantee the checkpoint interval is between > [MINIMAL_CHECKPOINT_TIME, Long.MaxValue). That is like the > JobGraph#isCheckpointingEnabled does. >In current implement, when we do not set checkpoint interval, leaving it > the default value -1, the interval will be changed to Long.MaxValue. Thus > DefaultExecutionGraphBuilder#isCheckpointingEnabled will return true. That is > not correct. > in addition, there are different classes assume checkpoint enabled with > different interval range. > 1. CheckpointConfig -> (0,Long.MaxValue*]*. > 2. JobGraph -> (0,Long.MaxValue) > This is not consistent. And the correct range is [MINIMAL_CHECKPOINT_TIME, > Long.MaxValue). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21999) The logic about whether Checkpoint is enabled.
[ https://issues.apache.org/jira/browse/FLINK-21999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17314236#comment-17314236 ] ZhangWei commented on FLINK-21999: -- [~trohrmann] In current implementation, when checkpoint interval is Long.MaxValue. DefaultExecutionGraphBuilder will assume checkpoint is enabled also and in the executionGraph returned by DefaultExecutionGraphBuilder#buildGraph, it will set the related checkpoint configuration. This method will Instantiate a CheckpointCoordinator. So when I set the checkpoint enabled condition to also require checkpoint interval not equal to Long.MaxValue, the CheckpointCoordinator will not be Instantiated, then OperatorCoordinatorSchedulerTest will fail with CheckpointCoordinator is null. In short, current implementation will always Instantiate the CheckpointCoordinator whenever the interval equals is Long.MaxValue or not. I am not sure what is the expected behavior. In addition, I found CheckpointCoordinator#isPeriodicCheckpointingConfigured, what does period means here? what is the difference between period and not period then? Is period means regular checkpoint and not period means disable checkpoint then? The failed CI tests result can be found below: https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=16033&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=05b74a19-4ee4-5036-c46f-ada307df6cf0 > The logic about whether Checkpoint is enabled. > -- > > Key: FLINK-21999 > URL: https://issues.apache.org/jira/browse/FLINK-21999 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Reporter: ZhangWei >Assignee: ZhangWei >Priority: Major > Labels: pull-request-available > > org.apache.flink.runtime.executiongraph.DefaultExecutionGraphBuilder#isCheckpointingEnabled > assumes checkpoint enabled when JobCheckpointingSettings is not null. While > this is not enough, we must also guarantee the checkpoint interval is between > [MINIMAL_CHECKPOINT_TIME, Long.MaxValue). That is like the > JobGraph#isCheckpointingEnabled does. >In current implement, when we do not set checkpoint interval, leaving it > the default value -1, the interval will be changed to Long.MaxValue. Thus > DefaultExecutionGraphBuilder#isCheckpointingEnabled will return true. That is > not correct. > in addition, there are different classes assume checkpoint enabled with > different interval range. > 1. CheckpointConfig -> (0,Long.MaxValue*]*. > 2. JobGraph -> (0,Long.MaxValue) > This is not consistent. And the correct range is [MINIMAL_CHECKPOINT_TIME, > Long.MaxValue). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21999) The logic about whether Checkpoint is enabled.
[ https://issues.apache.org/jira/browse/FLINK-21999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17312192#comment-17312192 ] Till Rohrmann commented on FLINK-21999: --- Great, I've assigned you to this ticket now. > The logic about whether Checkpoint is enabled. > -- > > Key: FLINK-21999 > URL: https://issues.apache.org/jira/browse/FLINK-21999 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Reporter: ZhangWei >Assignee: ZhangWei >Priority: Major > > org.apache.flink.runtime.executiongraph.DefaultExecutionGraphBuilder#isCheckpointingEnabled > assumes checkpoint enabled when JobCheckpointingSettings is not null. While > this is not enough, we must also guarantee the checkpoint interval is between > [MINIMAL_CHECKPOINT_TIME, Long.MaxValue). That is like the > JobGraph#isCheckpointingEnabled does. >In current implement, when we do not set checkpoint interval, leaving it > the default value -1, the interval will be changed to Long.MaxValue. Thus > DefaultExecutionGraphBuilder#isCheckpointingEnabled will return true. That is > not correct. > in addition, there are different classes assume checkpoint enabled with > different interval range. > 1. CheckpointConfig -> (0,Long.MaxValue*]*. > 2. JobGraph -> (0,Long.MaxValue) > This is not consistent. And the correct range is [MINIMAL_CHECKPOINT_TIME, > Long.MaxValue). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21999) The logic about whether Checkpoint is enabled.
[ https://issues.apache.org/jira/browse/FLINK-21999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311443#comment-17311443 ] ZhangWei commented on FLINK-21999: -- [~trohrmann] glad to take it! > The logic about whether Checkpoint is enabled. > -- > > Key: FLINK-21999 > URL: https://issues.apache.org/jira/browse/FLINK-21999 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Reporter: ZhangWei >Priority: Major > > org.apache.flink.runtime.executiongraph.DefaultExecutionGraphBuilder#isCheckpointingEnabled > assumes checkpoint enabled when JobCheckpointingSettings is not null. While > this is not enough, we must also guarantee the checkpoint interval is between > [MINIMAL_CHECKPOINT_TIME, Long.MaxValue). That is like the > JobGraph#isCheckpointingEnabled does. >In current implement, when we do not set checkpoint interval, leaving it > the default value -1, the interval will be changed to Long.MaxValue. Thus > DefaultExecutionGraphBuilder#isCheckpointingEnabled will return true. That is > not correct. > in addition, there are different classes assume checkpoint enabled with > different interval range. > 1. CheckpointConfig -> (0,Long.MaxValue*]*. > 2. JobGraph -> (0,Long.MaxValue) > This is not consistent. And the correct range is [MINIMAL_CHECKPOINT_TIME, > Long.MaxValue). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21999) The logic about whether Checkpoint is enabled.
[ https://issues.apache.org/jira/browse/FLINK-21999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311327#comment-17311327 ] Till Rohrmann commented on FLINK-21999: --- Thanks for reporting this issue [~Amon]. I agree that this behaviour is not consistent and should be fixed. Do you wanna try to fix the problem? > The logic about whether Checkpoint is enabled. > -- > > Key: FLINK-21999 > URL: https://issues.apache.org/jira/browse/FLINK-21999 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Reporter: ZhangWei >Priority: Major > > org.apache.flink.runtime.executiongraph.DefaultExecutionGraphBuilder#isCheckpointingEnabled > assumes checkpoint enabled when JobCheckpointingSettings is not null. While > this is not enough, we must also guarantee the checkpoint interval is between > [MINIMAL_CHECKPOINT_TIME, Long.MaxValue). That is like the > JobGraph#isCheckpointingEnabled does. >In current implement, when we do not set checkpoint interval, leaving it > the default value -1, the interval will be changed to Long.MaxValue. Thus > DefaultExecutionGraphBuilder#isCheckpointingEnabled will return true. That is > not correct. > in addition, there are different classes assume checkpoint enabled with > different interval range. > 1. CheckpointConfig -> (0,Long.MaxValue*]*. > 2. JobGraph -> (0,Long.MaxValue) > This is not consistent. And the correct range is [MINIMAL_CHECKPOINT_TIME, > Long.MaxValue). -- This message was sent by Atlassian Jira (v8.3.4#803005)