[jira] [Commented] (FLINK-18675) Checkpoint not maintaining minimum pause duration between checkpoints

2020-08-22 Thread Ravi Bhushan Ratnakar (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182272#comment-17182272
 ] 

Ravi Bhushan Ratnakar commented on FLINK-18675:
---

[~dian.fu]Fu, this ticket has been addressed by FLINK-18856

> Checkpoint not maintaining minimum pause duration between checkpoints
> -
>
> Key: FLINK-18675
> URL: https://issues.apache.org/jira/browse/FLINK-18675
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.11.0
> Environment: !image.png!
>Reporter: Ravi Bhushan Ratnakar
>Priority: Critical
> Fix For: 1.12.0, 1.11.2
>
> Attachments: image.png
>
>
> I am running a streaming job with Flink 1.11.0 using kubernetes 
> infrastructure. I have configured checkpoint configuration like below
> Interval - 3 minutes
> Minimum pause between checkpoints - 3 minutes
> Checkpoint timeout - 10 minutes
> Checkpointing Mode - Exactly Once
> Number of Concurrent Checkpoint - 1
>  
> Other configs
> Time Characteristics - Processing Time
>  
> I am observing an usual behaviour. *When a checkpoint completes successfully* 
> *and if it's end to end duration is almost equal or greater than Minimum 
> pause duration then the next checkpoint gets triggered immediately without 
> maintaining the Minimum pause duration*. Kindly notice this behaviour from 
> checkpoint id 194 onward in the attached screenshot



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18675) Checkpoint not maintaining minimum pause duration between checkpoints

2020-08-21 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182236#comment-17182236
 ] 

Dian Fu commented on FLINK-18675:
-

It seems duplicate with FLINK-18856 and has been addressed there. If this is 
the case, we should close this issue. [~raviratnakar]  Could you help to check 
if this is the case?

> Checkpoint not maintaining minimum pause duration between checkpoints
> -
>
> Key: FLINK-18675
> URL: https://issues.apache.org/jira/browse/FLINK-18675
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.11.0
> Environment: !image.png!
>Reporter: Ravi Bhushan Ratnakar
>Priority: Critical
> Fix For: 1.12.0, 1.11.2
>
> Attachments: image.png
>
>
> I am running a streaming job with Flink 1.11.0 using kubernetes 
> infrastructure. I have configured checkpoint configuration like below
> Interval - 3 minutes
> Minimum pause between checkpoints - 3 minutes
> Checkpoint timeout - 10 minutes
> Checkpointing Mode - Exactly Once
> Number of Concurrent Checkpoint - 1
>  
> Other configs
> Time Characteristics - Processing Time
>  
> I am observing an usual behaviour. *When a checkpoint completes successfully* 
> *and if it's end to end duration is almost equal or greater than Minimum 
> pause duration then the next checkpoint gets triggered immediately without 
> maintaining the Minimum pause duration*. Kindly notice this behaviour from 
> checkpoint id 194 onward in the attached screenshot



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18675) Checkpoint not maintaining minimum pause duration between checkpoints

2020-08-09 Thread Congxian Qiu(klion26) (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173755#comment-17173755
 ] 

Congxian Qiu(klion26) commented on FLINK-18675:
---

Seems there is a similar issue FLINK-18856

> Checkpoint not maintaining minimum pause duration between checkpoints
> -
>
> Key: FLINK-18675
> URL: https://issues.apache.org/jira/browse/FLINK-18675
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.11.0
> Environment: !image.png!
>Reporter: Ravi Bhushan Ratnakar
>Priority: Critical
> Attachments: image.png
>
>
> I am running a streaming job with Flink 1.11.0 using kubernetes 
> infrastructure. I have configured checkpoint configuration like below
> Interval - 3 minutes
> Minimum pause between checkpoints - 3 minutes
> Checkpoint timeout - 10 minutes
> Checkpointing Mode - Exactly Once
> Number of Concurrent Checkpoint - 1
>  
> Other configs
> Time Characteristics - Processing Time
>  
> I am observing an usual behaviour. *When a checkpoint completes successfully* 
> *and if it's end to end duration is almost equal or greater than Minimum 
> pause duration then the next checkpoint gets triggered immediately without 
> maintaining the Minimum pause duration*. Kindly notice this behaviour from 
> checkpoint id 194 onward in the attached screenshot



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18675) Checkpoint not maintaining minimum pause duration between checkpoints

2020-08-03 Thread Congxian Qiu(klion26) (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17170497#comment-17170497
 ] 

Congxian Qiu(klion26) commented on FLINK-18675:
---

[~raviratnakar] I think the problem here is that {{CheckpointRequestDecider}} 
has a wrong value of {{lastCheckpointCompletionRelativeTime}} when checking 
whether the checkpoint request is too early.

1. We retrieve the value of {{lastCheckpointCompletionRelativeTime}} when 
calling {{CheckpointRequestDecider#chooseRequestToExecute}} in 
{{CheckpointCoordinator#triggerCheckpoint}}
2. A pending checkpoint complete, and update the valuable 
{{pendingCheckpoints}} and {{lastCheckpointCompletionRelativeTime}}
3. In {{CheckpointRequestDecider#chooseRequestToExecute}} we use the previous 
{{lastCheckpointCompletionRelativeTime}} to check whether current checkpoint 
request is too early

I think we can get the value of {{lastCheckpointCompletionRelativeTime}} in 
{{CheckpointRequestDecider#chooseRequestToExecute}} here to solve the problem 
here.

> Checkpoint not maintaining minimum pause duration between checkpoints
> -
>
> Key: FLINK-18675
> URL: https://issues.apache.org/jira/browse/FLINK-18675
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.11.0
> Environment: !image.png!
>Reporter: Ravi Bhushan Ratnakar
>Priority: Critical
> Attachments: image.png
>
>
> I am running a streaming job with Flink 1.11.0 using kubernetes 
> infrastructure. I have configured checkpoint configuration like below
> Interval - 3 minutes
> Minimum pause between checkpoints - 3 minutes
> Checkpoint timeout - 10 minutes
> Checkpointing Mode - Exactly Once
> Number of Concurrent Checkpoint - 1
>  
> Other configs
> Time Characteristics - Processing Time
>  
> I am observing an usual behaviour. *When a checkpoint completes successfully* 
> *and if it's end to end duration is almost equal or greater than Minimum 
> pause duration then the next checkpoint gets triggered immediately without 
> maintaining the Minimum pause duration*. Kindly notice this behaviour from 
> checkpoint id 194 onward in the attached screenshot



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18675) Checkpoint not maintaining minimum pause duration between checkpoints

2020-07-26 Thread Congxian Qiu(klion26) (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165387#comment-17165387
 ] 

Congxian Qiu(klion26) commented on FLINK-18675:
---

Hi [~raviratnakar] from the git history, the code at line number 1512 was added 
in 1.9.0(and seems there change would not affect this problem), IIUC, we would 
check whether the checkpoint can be triggered somewhere, I need to check it 
carefully as the code changed a lot. will reply here If found anything.

> Checkpoint not maintaining minimum pause duration between checkpoints
> -
>
> Key: FLINK-18675
> URL: https://issues.apache.org/jira/browse/FLINK-18675
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.11.0
> Environment: !image.png!
>Reporter: Ravi Bhushan Ratnakar
>Priority: Critical
> Attachments: image.png
>
>
> I am running a streaming job with Flink 1.11.0 using kubernetes 
> infrastructure. I have configured checkpoint configuration like below
> Interval - 3 minutes
> Minimum pause between checkpoints - 3 minutes
> Checkpoint timeout - 10 minutes
> Checkpointing Mode - Exactly Once
> Number of Concurrent Checkpoint - 1
>  
> Other configs
> Time Characteristics - Processing Time
>  
> I am observing an usual behaviour. *When a checkpoint completes successfully* 
> *and if it's end to end duration is almost equal or greater than Minimum 
> pause duration then the next checkpoint gets triggered immediately without 
> maintaining the Minimum pause duration*. Kindly notice this behaviour from 
> checkpoint id 194 onward in the attached screenshot



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18675) Checkpoint not maintaining minimum pause duration between checkpoints

2020-07-23 Thread Ravi Bhushan Ratnakar (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163289#comment-17163289
 ] 

Ravi Bhushan Ratnakar commented on FLINK-18675:
---

[~klion26] from the attached screenshot, there is also one condition in 
checkpoint id 190,191 where checkpoint end to end duration is just below 
minimum pause duration however next checkpoint got triggered much early than 
the minimum pause duration. what do you think about the root cause of this 
issue?  is this due to "CheckpointCordinator class, at line number 1512 
,scheduleAtFixedRate method is being used."?

> Checkpoint not maintaining minimum pause duration between checkpoints
> -
>
> Key: FLINK-18675
> URL: https://issues.apache.org/jira/browse/FLINK-18675
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.11.0
> Environment: !image.png!
>Reporter: Ravi Bhushan Ratnakar
>Priority: Critical
> Attachments: image.png
>
>
> I am running a streaming job with Flink 1.11.0 using kubernetes 
> infrastructure. I have configured checkpoint configuration like below
> Interval - 3 minutes
> Minimum pause between checkpoints - 3 minutes
> Checkpoint timeout - 10 minutes
> Checkpointing Mode - Exactly Once
> Number of Concurrent Checkpoint - 1
>  
> Other configs
> Time Characteristics - Processing Time
>  
> I am observing an usual behaviour. *When a checkpoint completes successfully* 
> *and if it's end to end duration is almost equal or greater than Minimum 
> pause duration then the next checkpoint gets triggered immediately without 
> maintaining the Minimum pause duration*. Kindly notice this behaviour from 
> checkpoint id 194 onward in the attached screenshot



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18675) Checkpoint not maintaining minimum pause duration between checkpoints

2020-07-22 Thread Ravi Bhushan Ratnakar (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17162932#comment-17162932
 ] 

Ravi Bhushan Ratnakar commented on FLINK-18675:
---

As per my understanding of the code, in the CheckpointCordinator class, at line 
number 
[1512|[https://github.com/apache/flink/blob/release-1.11.0/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java#L1512]]
 ,scheduleAtFixedRate method is being used. I think that we should use 
"[scheduleWithFixedDelay|[https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ScheduledExecutorService.html#scheduleWithFixedDelay-java.lang.Runnable-long-long-java.util.concurrent.TimeUnit-]]{{"}}

> Checkpoint not maintaining minimum pause duration between checkpoints
> -
>
> Key: FLINK-18675
> URL: https://issues.apache.org/jira/browse/FLINK-18675
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.11.0
> Environment: !image.png!
>Reporter: Ravi Bhushan Ratnakar
>Priority: Critical
> Attachments: image.png
>
>
> I am running a streaming job with Flink 1.11.0 using kubernetes 
> infrastructure. I have configured checkpoint configuration like below
> Interval - 3 minutes
> Minimum pause between checkpoints - 3 minutes
> Checkpoint timeout - 10 minutes
> Checkpointing Mode - Exactly Once
> Number of Concurrent Checkpoint - 1
>  
> Other configs
> Time Characteristics - Processing Time
>  
> I am observing an usual behaviour. *When a checkpoint completes successfully* 
> *and if it's end to end duration is almost equal or greater than Minimum 
> pause duration then the next checkpoint gets triggered immediately without 
> maintaining the Minimum pause duration*. Kindly notice this behaviour from 
> checkpoint id 194 onward in the attached screenshot



--
This message was sent by Atlassian Jira
(v8.3.4#803005)