[GitHub] [flink] rkhachatryan commented on pull request #14635: [FLINK-19462][checkpointing] Update failed checkpoint stats

2021-01-25 Thread GitBox


rkhachatryan commented on pull request #14635:
URL: https://github.com/apache/flink/pull/14635#issuecomment-766707720


   Thanks for the review @pnowojski .
   I've added the space and created a ticket to translate the docs.
   I've also squashed the commits.
   
   > for example AsynCheckpointRunnable fails (throws an exception), I can not 
see any stats for any subtasks that have finished after the failure
   
   As discussed offline, this happens because the failed upstream doesn't sent 
barrier downstream.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] rkhachatryan commented on pull request #14635: [FLINK-19462][checkpointing] Update failed checkpoint stats

2021-01-25 Thread GitBox


rkhachatryan commented on pull request #14635:
URL: https://github.com/apache/flink/pull/14635#issuecomment-766707720


   Thanks for the review @pnowojski .
   I've added the space and created a ticket to translate the docs.
   I've also squashed the commits.
   
   > for example AsynCheckpointRunnable fails (throws an exception), I can not 
see any stats for any subtasks that have finished after the failure
   
   As discussed offline, this happens because the failed upstream doesn't sent 
barrier downstream.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] rkhachatryan commented on pull request #14635: [FLINK-19462][checkpointing] Update failed checkpoint stats

2021-01-22 Thread GitBox


rkhachatryan commented on pull request #14635:
URL: https://github.com/apache/flink/pull/14635#issuecomment-765417912


   I've updated the PR (adding 4 new commits):
   1. Tasks reporting upon abort RPC are marked as `aborted` in e2e duration 
column
   2. Only tasks that actually ACKed checkpoint are counted for ackCount and 
lastAckTime
   3. `-1B` is shown as `-` (the same way as durations)
   4. Fix the docs
   
   
![image](https://user-images.githubusercontent.com/3939322/105499876-669e6700-5cc2-11eb-8d99-b301a83a548c.png)
   
   cc: @NicoK



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] rkhachatryan commented on pull request #14635: [FLINK-19462][checkpointing] Update failed checkpoint stats

2021-01-20 Thread GitBox


rkhachatryan commented on pull request #14635:
URL: https://github.com/apache/flink/pull/14635#issuecomment-763414913


   Thanks a lot for trying it out.
   > I think it's strictly necessary to:
   > clearly mark which checkpoint for which subtask has failed
   
   It is not always the task that fails a checkpoint. Timeout decision is made 
by the `CheckpointCoordinator`.
   Multiple tasks can fail independently as well.
   I agree that marking "failed" tasks would be useful but I don't think it's 
directly related to this feature or at least this PR.
   
   > if we were not able to collect/calculate a metric, it must be N/A - not 
just 0ms
   
   I don't see `0ms` on your screenshots nor while running locally. Do you mean 
`0 B` per operator? 
   If so, why is it incorrect? (I do see non-zero size running cluster).
   
   > correctly calculate the durations (end to end, sync, async, etc...) also 
for failed checkpoints, not just N/A
   
   A checkpoint can be cancelled before even being started on some subtasks. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] rkhachatryan commented on pull request #14635: [FLINK-19462][checkpointing] Update failed checkpoint stats

2021-01-14 Thread GitBox


rkhachatryan commented on pull request #14635:
URL: https://github.com/apache/flink/pull/14635#issuecomment-760328916


   Thanks for reviewing, @pnowojski.
   I've addressed your feedback, PTAL.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org