[ 
https://issues.apache.org/jira/browse/FLINK-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16654688#comment-16654688
 ] 

ASF GitHub Bot commented on FLINK-10319:
----------------------------------------

TisonKun commented on issue #6680: [FLINK-10319] [runtime] Too many 
requestPartitionState would crash JM
URL: https://github.com/apache/flink/pull/6680#issuecomment-430886993
 
 
   As "deploying tasks in topological order", I agree that it could help. It is 
a orthonormal improvement though.
   
   For your hesitancy, I'd like to learn in which situation that a downstream 
operator would not be failed by a upstream failing. To keep the state clean 
either the upstream fails downstream and both restore from the least 
checkpoint, or we need to implement a failover strategy that take the 
responsibility for reconcile the state. The latter sounds quite costly.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Too many requestPartitionState would crash JM
> ---------------------------------------------
>
>                 Key: FLINK-10319
>                 URL: https://issues.apache.org/jira/browse/FLINK-10319
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Coordination
>    Affects Versions: 1.7.0
>            Reporter: TisonKun
>            Assignee: TisonKun
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.7.0
>
>
> Do not requestPartitionState from JM on partition request fail, which may 
> generate too many RPC requests and block JM.
> We gain little benefit to check what state producer is in, which in the other 
> hand crash JM by too many RPC requests. Task could always 
> retriggerPartitionRequest from its InputGate, it would be fail if the 
> producer has gone and succeed if the producer alive. Anyway, no need to ask 
> for JM for help.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to