[ https://issues.apache.org/jira/browse/FLINK-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844829#comment-16844829 ]
Stefan Richter commented on FLINK-10855: ---------------------------------------- [~yanghua] Just a heads-up that you pinged the wrong Stefan Richter in your question, that is maybe why you got a confusing answer. > CheckpointCoordinator does not delete checkpoint directory of late/failed > checkpoints > ------------------------------------------------------------------------------------- > > Key: FLINK-10855 > URL: https://issues.apache.org/jira/browse/FLINK-10855 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing > Affects Versions: 1.5.5, 1.6.2, 1.7.0 > Reporter: Till Rohrmann > Assignee: vinoyang > Priority: Major > > In case that an acknowledge checkpoint message is late or a checkpoint cannot > be acknowledged, we discard the subtask state in the > {{CheckpointCoordinator}}. What's not happening in this case is that we > delete the parent directory of the checkpoint. This only happens when we > dispose a {{PendingCheckpoint#dispose}}. > Due to this behaviour it can happen that a checkpoint fails (e.g. a task not > being ready) and we delete the checkpoint directory. Next another task writes > its checkpoint data to the checkpoint directory (thereby creating it again) > and sending an acknowledge message back to the {{CheckpointCoordinator}}. The > {{CheckpointCoordinator}} will realize that there is no longer a > {{PendingCheckpoint}} and will discard the sub task state. This will remove > the state files from the checkpoint directory but will leave the checkpoint > directory untouched. -- This message was sent by Atlassian JIRA (v7.6.3#76005)