[ https://issues.apache.org/jira/browse/FLINK-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Till Rohrmann reassigned FLINK-10855: ------------------------------------- Assignee: (was: vinoyang) > CheckpointCoordinator does not delete checkpoint directory of late/failed > checkpoints > ------------------------------------------------------------------------------------- > > Key: FLINK-10855 > URL: https://issues.apache.org/jira/browse/FLINK-10855 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing > Affects Versions: 1.5.5, 1.6.2, 1.7.0 > Reporter: Till Rohrmann > Priority: Major > Labels: stale-assigned > > In case that an acknowledge checkpoint message is late or a checkpoint cannot > be acknowledged, we discard the subtask state in the > {{CheckpointCoordinator}}. What's not happening in this case is that we > delete the parent directory of the checkpoint. This only happens when we > dispose a {{PendingCheckpoint#dispose}}. > Due to this behaviour it can happen that a checkpoint fails (e.g. a task not > being ready) and we delete the checkpoint directory. Next another task writes > its checkpoint data to the checkpoint directory (thereby creating it again) > and sending an acknowledge message back to the {{CheckpointCoordinator}}. The > {{CheckpointCoordinator}} will realize that there is no longer a > {{PendingCheckpoint}} and will discard the sub task state. This will remove > the state files from the checkpoint directory but will leave the checkpoint > directory untouched. -- This message was sent by Atlassian Jira (v8.3.4#803005)