[ https://issues.apache.org/jira/browse/FLINK-23041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dawid Wysakowicz updated FLINK-23041: ------------------------------------- Release Note: Now it's measured as the time between the start of a checkpoint(on the checkpoint coordinator) and the time when the checkpoint barrier is received by a task. was: The semantic of alignmentTimeout configuration has changed to such meaning: The time between the start of a checkpoint(on the checkpoint coordinator) and the time when the checkpoint barrier is received by a task. > Change local alignment timeout back to the global time out > ---------------------------------------------------------- > > Key: FLINK-23041 > URL: https://issues.apache.org/jira/browse/FLINK-23041 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing > Reporter: Anton Kalashnikov > Assignee: Anton Kalashnikov > Priority: Major > Labels: pull-request-available > Fix For: 1.14.0 > > > Local alignment timeouts are very confusing and especially without timeout on > the outputs, they can significantly delay timeouting to UC. > Problematic case is when all CBs are received with long delay because of the > back pressure, but they arrive at the same time. Alignment time can be low > (milliseconds), while start delay is ~1 minute. In that case checkpoint > doesn't timeout to UC and is passing the responsibility to timeout down the > stream. > > So it is not so transparant for the user why and when AC switches to UC. As > mentioned before, the start delay is not correlated with the alignment > timeout because it doesn't take into account time in output buffer. the > alignment time is not fully correlated with the alignment timeout because the > alignment time doesn't take into account the barrier announcement. > > Based on this, there is the proposal to change the semantic of > alignmentTimeout configuration to such meaning: > *The time between the starting of checkpoint(on the checkpont coordinator) > and the time when the checkpoint barrier will be received by task.* > By this definition, we will have kind of global timeout which says that if > the AC isn't finished for alignmentTimeout time it will be switched to UC. -- This message was sent by Atlassian Jira (v8.3.4#803005)