[ https://issues.apache.org/jira/browse/FLINK-18336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhijiang updated FLINK-18336: ----------------------------- Fix Version/s: 1.12.0 > CheckpointFailureManager forgets failed checkpoints after a successful one > -------------------------------------------------------------------------- > > Key: FLINK-18336 > URL: https://issues.apache.org/jira/browse/FLINK-18336 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing > Reporter: Roman Khachatryan > Assignee: Roman Khachatryan > Priority: Major > Labels: pull-request-available > Fix For: 1.12.0 > > > To my understanding, failure shouldn't be counted more than once for a single > checkpoint. > However, after a successful checkpoint, all previous failures are cleared. > So this test will currently fail: > > {code:java} > TestFailJobCallback callback = new TestFailJobCallback(); > CheckpointFailureManager failureManager = new CheckpointFailureManager(2, > callback); > failureManager.handleJobLevelCheckpointException(new > CheckpointException(CHECKPOINT_EXPIRED), 1L); > failureManager.handleJobLevelCheckpointException(new > CheckpointException(CHECKPOINT_EXPIRED), 2L); > failureManager.handleCheckpointSuccess(2L); > failureManager.handleJobLevelCheckpointException(new > CheckpointException(CHECKPOINT_EXPIRED), 3L); > failureManager.handleJobLevelCheckpointException(new > CheckpointException(CHECKPOINT_EXPIRED), 4L); > // shouldn't be counted because 1L has already failed: > failureManager.handleJobLevelCheckpointException(new > CheckpointException(CHECKPOINT_EXPIRED), 1L); > assertEquals(0, callback.getInvokeCounter());{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)