[
https://issues.apache.org/jira/browse/FLINK-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382036#comment-15382036
]
ASF GitHub Bot commented on FLINK-4152:
---------------------------------------
Github user mxm commented on the issue:
https://github.com/apache/flink/pull/2257
Thank you for the pull request! Looking at the changes, it looks like it
could have been broken up into two pull requests and jira issues. 1) Avoiding
duplicate RegisterTaskManager messages 2) Changing core behavior of the
ResourceManager.
Concerning 2, I would like to understand why it was necessary to change so
much code. It seems like it would have sufficed to change one line of code (not
clearing the bookkeeping on leader ship change). I'm not saying your changes
don't make sense but I don't think they are backed by the original JIRA issue.
I'm not sure about the role change of the RM in this PR. The RM should be
the authority for allocating new resources. If those resources are not properly
reported back to the RM (e.g. message loss), the resource allocation won't work
properly.
> TaskManager registration exponential backoff doesn't work
> ---------------------------------------------------------
>
> Key: FLINK-4152
> URL: https://issues.apache.org/jira/browse/FLINK-4152
> Project: Flink
> Issue Type: Bug
> Components: Distributed Coordination, TaskManager, YARN Client
> Reporter: Robert Metzger
> Assignee: Till Rohrmann
> Attachments: logs.tgz
>
>
> While testing Flink 1.1 I've found that the TaskManagers are logging many
> messages when registering at the JobManager.
> This is the log file:
> https://gist.github.com/rmetzger/0cebe0419cdef4507b1e8a42e33ef294
> Its logging more than 3000 messages in less than a minute. I don't think that
> this is the expected behavior.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)