[GitHub] yanghua commented on issue #6567: [FLINK-10074] Allowable number of checkpoint failures

2018-10-30 Thread GitBox
yanghua commented on issue #6567: [FLINK-10074] Allowable number of checkpoint 
failures
URL: https://github.com/apache/flink/pull/6567#issuecomment-434257204
 
 
   @azagrebin OK, agree.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] yanghua commented on issue #6567: [FLINK-10074] Allowable number of checkpoint failures

2018-10-30 Thread GitBox
yanghua commented on issue #6567: [FLINK-10074] Allowable number of checkpoint 
failures
URL: https://github.com/apache/flink/pull/6567#issuecomment-434220095
 
 
   Hi @azagrebin  I am currently busy with several other PRs and hope it will 
be merged into Flink 1.7.0. When those PRs are fixed, I will come back and 
reorganize it. I think some logic of this PR is available, but the counting 
logic may need to be refactored. Can we put it aside for some time?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] yanghua commented on issue #6567: [FLINK-10074] Allowable number of checkpoint failures

2018-09-24 Thread GitBox
yanghua commented on issue #6567: [FLINK-10074] Allowable number of checkpoint 
failures
URL: https://github.com/apache/flink/pull/6567#issuecomment-423940208
 
 
   @azagrebin thanks for your suggestion, I agree and reconsider more details.
   
   Considering that @tillrohrmann  has said that the current checkpoint 
exception handler should also be implemented in `CheckpointCoordinator`. 
   
   > I understand why you implemented it the way you did. I think the 
`setFailOnCheckpointingErrors` should actually also be handled by the 
`CheckpointCoordinator`/`JM` and not on the Task level (but this is a different 
story). This looks a little bit like a shortcut we made back in the days.
   
   Do we need to take this refactoring into account? Because this PR is 
actually a supplement to the checkpoint exception handler.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] yanghua commented on issue #6567: [FLINK-10074] Allowable number of checkpoint failures

2018-09-21 Thread GitBox
yanghua commented on issue #6567: [FLINK-10074] Allowable number of checkpoint 
failures
URL: https://github.com/apache/flink/pull/6567#issuecomment-423453821
 
 
   hi @tillrohrmann what do you think about the latest implementation?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] yanghua commented on issue #6567: [FLINK-10074] Allowable number of checkpoint failures

2018-09-18 Thread GitBox
yanghua commented on issue #6567: [FLINK-10074] Allowable number of checkpoint 
failures
URL: https://github.com/apache/flink/pull/6567#issuecomment-422427072
 
 
   hi @tillrohrmann I have refactored this PR and counted the failure number in 
the `CheckpointCoordinator`. I think I should push the implementation to let 
you estimate. I have tested the counter's run path, but I don't know if it is 
the [right 
way](https://github.com/apache/flink/pull/6567/files#diff-a38ea0fa799bdaa0b354d80cd8368c60R1010)
 of failing the `ExecutionGraph` . And maybe the test case I added have too 
much assert, I will reduce it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] yanghua commented on issue #6567: [FLINK-10074] Allowable number of checkpoint failures

2018-09-17 Thread GitBox
yanghua commented on issue #6567: [FLINK-10074] Allowable number of checkpoint 
failures
URL: https://github.com/apache/flink/pull/6567#issuecomment-421948370
 
 
   @tillrohrmann I originally wanted to do it in the checkpoint coordinator 
when I first implemented it, but when I touched setFailOnCheckpointingErrors, I 
was guided and turned to the TM. I have also thought about this issue recently 
and it seems that it is more reasonable to implement on the checkpoint 
coordinator. Well, I will implement it again based on the checkpoint 
coordinator.
   
   To @klion26 thanks for your opinion, will change the implementation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] yanghua commented on issue #6567: [FLINK-10074] Allowable number of checkpoint failures

2018-09-17 Thread GitBox
yanghua commented on issue #6567: [FLINK-10074] Allowable number of checkpoint 
failures
URL: https://github.com/apache/flink/pull/6567#issuecomment-421931626
 
 
   @tweise Any comments or opinions? The current count implementation is for a 
single sub task instance, not the job level. Maybe @tillrohrmann  's idea is 
more reasonable?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] yanghua commented on issue #6567: [FLINK-10074] Allowable number of checkpoint failures

2018-09-14 Thread GitBox
yanghua commented on issue #6567: [FLINK-10074] Allowable number of checkpoint 
failures
URL: https://github.com/apache/flink/pull/6567#issuecomment-421282760
 
 
   Hi @tillrohrmann , Currently it is applied to the sub task instance (TM) 
instead of the checkpoint coordinator (JM) as a complement to the 
setFailOnCheckpointingErrors method, since setFailOnCheckpointingErrors only 
provides the user with yes or no choices.
   
   But it sounds reasonable to apply the count on the Job globally.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] yanghua commented on issue #6567: [FLINK-10074] Allowable number of checkpoint failures

2018-09-06 Thread GitBox
yanghua commented on issue #6567: [FLINK-10074] Allowable number of checkpoint 
failures
URL: https://github.com/apache/flink/pull/6567#issuecomment-419137130
 
 
   @azagrebin thanks for your suggestion, refactored this PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] yanghua commented on issue #6567: [FLINK-10074] Allowable number of checkpoint failures

2018-09-06 Thread GitBox
yanghua commented on issue #6567: [FLINK-10074] Allowable number of checkpoint 
failures
URL: https://github.com/apache/flink/pull/6567#issuecomment-419073463
 
 
   @tillrohrmann and @zentol if you have time, can you have a look at this PR?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] yanghua commented on issue #6567: [FLINK-10074] Allowable number of checkpoint failures

2018-09-04 Thread GitBox
yanghua commented on issue #6567: [FLINK-10074] Allowable number of checkpoint 
failures
URL: https://github.com/apache/flink/pull/6567#issuecomment-418295546
 
 
   @azagrebin thanks for your suggestion, I have refactored this PR, please 
review again~


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] yanghua commented on issue #6567: [FLINK-10074] Allowable number of checkpoint failures

2018-08-22 Thread GitBox
yanghua commented on issue #6567: [FLINK-10074] Allowable number of checkpoint 
failures
URL: https://github.com/apache/flink/pull/6567#issuecomment-415303507
 
 
   @tillrohrmann When you have free time, please review this PR. thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] yanghua commented on issue #6567: [FLINK-10074] Allowable number of checkpoint failures

2018-08-16 Thread GitBox
yanghua commented on issue #6567: [FLINK-10074] Allowable number of checkpoint 
failures
URL: https://github.com/apache/flink/pull/6567#issuecomment-413486884
 
 
   cc @tillrohrmann 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services