rkhachatryan commented on pull request #14635: URL: https://github.com/apache/flink/pull/14635#issuecomment-763414913
Thanks a lot for trying it out. > I think it's strictly necessary to: > clearly mark which checkpoint for which subtask has failed It is not always the task that fails a checkpoint. Timeout decision is made by the `CheckpointCoordinator`. Multiple tasks can fail independently as well. I agree that marking "failed" tasks would be useful but I don't think it's directly related to this feature or at least this PR. > if we were not able to collect/calculate a metric, it must be N/A - not just 0ms I don't see `0ms` on your screenshots nor while running locally. Do you mean `0 B` per operator? If so, why is it incorrect? (I do see non-zero size running cluster). > correctly calculate the durations (end to end, sync, async, etc...) also for failed checkpoints, not just N/A A checkpoint can be cancelled before even being started on some subtasks. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org