rkhachatryan commented on pull request #14635:
URL: https://github.com/apache/flink/pull/14635#issuecomment-763414913


   Thanks a lot for trying it out.
   > I think it's strictly necessary to:
   > clearly mark which checkpoint for which subtask has failed
   
   It is not always the task that fails a checkpoint. Timeout decision is made 
by the `CheckpointCoordinator`.
   Multiple tasks can fail independently as well.
   I agree that marking "failed" tasks would be useful but I don't think it's 
directly related to this feature or at least this PR.
   
   > if we were not able to collect/calculate a metric, it must be N/A - not 
just 0ms
   
   I don't see `0ms` on your screenshots nor while running locally. Do you mean 
`0 B` per operator? 
   If so, why is it incorrect? (I do see non-zero size running cluster).
   
   > correctly calculate the durations (end to end, sync, async, etc...) also 
for failed checkpoints, not just N/A
   
   A checkpoint can be cancelled before even being started on some subtasks. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to