JesseAtSZ commented on PR #20091: URL: https://github.com/apache/flink/pull/20091#issuecomment-1173787366
> Thanks for the PR @JesseAtSZ , > > I'm trying to understand why checkpoints are failing with `FINALIZE_CHECKPOINT_FAILURE` (which is ignored by `CheckpointFailureManager`) and not something like `IOException`. From the code, it might happen only in `CheckpointCoordinator` - when all the tasks have already acknowleged the checkpoint. That probably means that the job is stateless. Could you confirm that @JesseAtSZ ? > > If that's NOT the case then we probably should fix failure counting first. > > Another question is related to the TM - do we need a symmetric check there? (in `FsCheckpointStorageAccess.resolveCheckpointStorageLocation`). I read the code, `FINALIZE_CHECKPOINT_FAILURE ` only occurs in the checkpoint coordinator. In addition, I don't think it is necessary to check the path on TM, because the initialization in Coordinator will be before `performCheckpoint`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org