[ 
https://issues.apache.org/jira/browse/FLINK-23041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz updated FLINK-23041:
-------------------------------------
    Release Note: 
Now it's measured as the time between the start of a checkpoint(on the 
checkpoint coordinator) and the
time when the checkpoint barrier is received by a task.

  was:
The semantic of alignmentTimeout configuration has changed to such meaning:

The time between the start of a checkpoint(on the checkpoint coordinator) and 
the time when the checkpoint barrier is received by a task.


> Change local alignment timeout back to the global time out
> ----------------------------------------------------------
>
>                 Key: FLINK-23041
>                 URL: https://issues.apache.org/jira/browse/FLINK-23041
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Checkpointing
>            Reporter: Anton Kalashnikov
>            Assignee: Anton Kalashnikov
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.14.0
>
>
> Local alignment timeouts are very confusing and especially without timeout on 
> the outputs, they can significantly delay timeouting to UC.
> Problematic case is when all CBs are received with long delay because of the 
> back pressure, but they arrive at the same time. Alignment time can be low 
> (milliseconds), while start delay is ~1 minute. In that case checkpoint 
> doesn't timeout to UC and is passing the responsibility to timeout down the 
> stream.
>  
> So it is not so transparant for the user why and when AC switches to UC. As 
> mentioned before, the start delay is not correlated with the alignment 
> timeout because it doesn't take into account time in output buffer. the 
> alignment time is not fully correlated with the alignment timeout because the 
> alignment time doesn't take into account the barrier announcement.
>  
> Based on this, there is the proposal to change the semantic of 
> alignmentTimeout configuration to such meaning:
> *The time between the starting of checkpoint(on the checkpont coordinator) 
> and the time when the checkpoint barrier will be received by task.*
> By this definition, we will have kind of global timeout which says that if 
> the AC isn't finished for alignmentTimeout time it will be switched to UC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to