[ https://issues.apache.org/jira/browse/FLINK-32347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stefan Richter updated FLINK-32347: ----------------------------------- Fix Version/s: 1.18.0 > Exceptions from the CompletedCheckpointStore are not registered by the > CheckpointFailureManager > ------------------------------------------------------------------------------------------------ > > Key: FLINK-32347 > URL: https://issues.apache.org/jira/browse/FLINK-32347 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing > Affects Versions: 1.15.3, 1.16.2, 1.17.1 > Reporter: Tigran Manasyan > Assignee: Stefan Richter > Priority: Major > Labels: pull-request-available > Fix For: 1.18.0 > > > Currently if an error occurs while saving a completed checkpoint in the > {_}CompletedCheckpointStore{_}, _CheckpointCoordinator_ doesn't call > _CheckpointFailureManager_ to handle the error. Such behavior leads to the > fact, that errors from _CompletedCheckpointStore_ don't increase the failed > checkpoints count and > _'execution.checkpointing.tolerable-failed-checkpoints'_ option does not > limit the number of errors of this kind in any way. > Possible solution may be to move the notification of > _CheckpointFailureManager_ about successful checkpoint after storing > completed checkpoint in the _CompletedCheckpointStore_ and providing the > exception to the _CheckpointFailureManager_ in the > {_}CheckpointCoordinator#{_}{_}[addCompletedCheckpointToStoreAndSubsumeOldest()|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java#L1440]{_} > method. -- This message was sent by Atlassian Jira (v8.20.10#820010)