[ 
https://issues.apache.org/jira/browse/FLINK-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16153429#comment-16153429
 ] 

ASF GitHub Bot commented on FLINK-7580:
---------------------------------------

GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/4643

    [FLINK-7580] Automatically retry failed gateway retrievals

    ## What is the purpose of the change
    
    The LeaderGatewayRetriever implementations, AkkaJobManagerRetriever and the 
RpcGatewayRetriever, now automatically retry the gateway retrieval operation 
for a fixed number of times with a retry delay before completing the gateway 
future with an exception.
    
    ## Verifying this change
    
    This change is already covered by existing tests, such as `FutureUtilsTest` 
(`retryWithDelay` tests).
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (no)
      - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
      - The serializers: (no)
      - The runtime per-record code paths (performance sensitive): (no)
      - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (no)
      - If yes, how is the feature documented? (not applicable)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink 
retryingLeaderGatewayRetrieverImpl

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/4643.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4643
    
----
commit 15358b4acaef2b2a84e23cf21dcade2014df4abb
Author: Till Rohrmann <[email protected]>
Date:   2017-08-15T10:00:58Z

    [FLINK-7458] Generalize GatewayRetriever for WebRuntimeMonitor
    
    Introduce a generalized GatewayRetriever replacing the JobManagerRetriever. 
The
    GatewayRetriever fulfills the same purpose as the JobManagerRetriever with 
the
    ability to retrieve the gateway for an arbitrary endpoint type.

commit 1386ea4f56f86970cb0dc8af783144c35fe1e3f3
Author: Till Rohrmann <[email protected]>
Date:   2017-08-15T11:55:47Z

    [FLINK-7459] Generalize Flink's redirection logic
    
    Introduce RedirectHandler which can be extended to add redirection 
functionality to all
    SimpleInboundChannelHandlers. This allows to share the same functionality 
across the
    StaticFileServerHandler and the RuntimeMonitorHandlerBase which could now 
be removed.
    In the future, the AbstractRestHandler will also extend the RedirectHandler.

commit 9eaf8f6227f571e898fe28b61d89173416bda129
Author: Till Rohrmann <[email protected]>
Date:   2017-08-18T12:29:29Z

    [FLINK-7533] Let LeaderGatewayRetriever retry failed gateway retrievals
    
    Add test case
    
    Only log LeaderGatewayRetriever exception on Debug log level
    
    Properly fail outdated gateway retrieval operations

commit 42cc51b5db800c6776c2e398ea2cae0651b2d49c
Author: Till Rohrmann <[email protected]>
Date:   2017-09-04T14:42:24Z

    [FLINK-7576] [futures] Add FutureUtils.retryWithDelay
    
    FutureUtils.retryWithDelay executes the given operation of type
    Callable<CompletableFuture<T>> n times and waits in between retries the 
given
    delay. This allows to retry an operation with a specified delay.
    
    Make retry and retry with delay future properly cancellable

commit a717cc616af2b3a24fbeb9b70137e0401ea24507
Author: Till Rohrmann <[email protected]>
Date:   2017-09-04T15:57:08Z

    [FLINK-7580] Automatically retry failed gateway retrievals
    
    The LeaderGatewayRetriever implementations, AkkaJobManagerRetriever and the
    RpcGatewayRetriever, now automatically retry the gateway retrieval operation
    for a fixed number of times with a retry delay before completing the gateway
    future with an exception.
    
    Retry AkkaJobManagerRetriever
    
    Retry RpcGatewayRetriever

----


> Let LeaderGatewayRetriever implementations automatically retry failed gateway 
> retrieval operations
> --------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-7580
>                 URL: https://issues.apache.org/jira/browse/FLINK-7580
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Coordination, Webfrontend
>    Affects Versions: 1.4.0
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>            Priority: Minor
>
> The {{LeaderGatewayRetrieval}} implementations {{AkkaJobManagerRetriever}} 
> and the {{RpcGatewayRetriever}} should automatically retry failed gateway 
> retrieval operations. This could be the case if the {{WebRuntimeMonitor}} is 
> started before the actual Akka/RPC component. I would propose to retry it a 
> fixed number of times with a short delay in between. If the resolution fails 
> after exceeding the retries, a new retrieval operation will be started when 
> requesting information from the {{WebRuntimeMonitor}} with FLINK-7533. This 
> ensures that the retry operation won't run forever but also that it will 
> eventually connect to the Akka/RPC component if it is existent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to