[ https://issues.apache.org/jira/browse/FLINK-10724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
vinoyang updated FLINK-10724: ----------------------------- Comment: was deleted (was: The main failure reasons list below: {code:java} CheckpointExpired(“Checkpoint expired before completing”) CheckpointSubsumed(“Checkpoint has been subsumed”) CheckpointDeclined(“Checkpoint was declined (tasks not ready)”) CheckpointError(“Checkpoint failed”) {code} They could be defined as some enum values in {{CheckpointFailureReason}}. Like {{CheckpointTriggerResult}}, I also suggest that we could introduce a class, for example, named {{CheckpointInvokeResult}} which contains {{CheckpointFailureReason}} and represents the invoke result. Considering when we count the number of failures, we want to contain the trigger result of savepoint. The {{CheckpointFailureManager}} will response both {{CheckpointTriggerResult}} and {{CheckpointInvokeResult}}. What do you think? [~azagrebin] and [~till.rohrmann] ) > Refactor failure handling in check point coordinator > ---------------------------------------------------- > > Key: FLINK-10724 > URL: https://issues.apache.org/jira/browse/FLINK-10724 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing > Reporter: Andrey Zagrebin > Assignee: vinoyang > Priority: Major > > At the moment failure handling of asynchronously triggered checkpoint in > check point coordinator happens in different places. We could organise it > similar way as failure handling of synchronous triggering of checkpoint in > *CheckpointTriggerResult* where we classify error cases. This will simplify > e.g. integration of error counter for FLINK-10074. > See also discussion here: [https://github.com/apache/flink/pull/6567] -- This message was sent by Atlassian JIRA (v7.6.3#76005)