[jira] [Commented] (FLINK-4152) TaskManager registration exponential backoff doesn't work

ASF GitHub Bot (JIRA) Wed, 20 Jul 2016 04:42:04 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385711#comment-15385711
 ]


ASF GitHub Bot commented on FLINK-4152:
---------------------------------------

Github user tillrohrmann commented on the issue:

    https://github.com/apache/flink/pull/2257
  
    Hi @mxm, I've changed the implementation such that we no longer need the 
`containersLaunched` map in the `YarnFlinkResourceManager`. Instead we're not 
clearing the `registeredWorkers` map in the `FlinkResourceManager` when the 
`JobManager` loses leadership. Thus, the `registeredWorkers` field denotes the 
successfully started task managers (and the containers they are running in).
    
    Additionally I reintroduced the reconnect resource manager functionality in 
the job manager. This should make sure that the resource manager is eventually 
notified about newly registered resources. In the current implementation, 
however, the resource manager will always accept the register resource 
messages. So only if the message gets lost and thus triggers a timeout 
exception, the reconnect resource manager message is sent.
    
    Would be great if you could take another look at the changes.


> TaskManager registration exponential backoff doesn't work
> ---------------------------------------------------------
>
>                 Key: FLINK-4152
>                 URL: https://issues.apache.org/jira/browse/FLINK-4152
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Coordination, TaskManager, YARN Client
>            Reporter: Robert Metzger
>            Assignee: Till Rohrmann
>         Attachments: logs.tgz
>
>
> While testing Flink 1.1 I've found that the TaskManagers are logging many 
> messages when registering at the JobManager.
> This is the log file: 
> https://gist.github.com/rmetzger/0cebe0419cdef4507b1e8a42e33ef294
> Its logging more than 3000 messages in less than a minute. I don't think that 
> this is the expected behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-4152) TaskManager registration exponential backoff doesn't work

Reply via email to