[ https://issues.apache.org/jira/browse/FLINK-6161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16359733#comment-16359733 ]
mingleizhang commented on FLINK-6161: ------------------------------------- Hi, [~zjwang] Are you still working on this ? > Retry connection in case of a ResourceManager heartbeat timeout > --------------------------------------------------------------- > > Key: FLINK-6161 > URL: https://issues.apache.org/jira/browse/FLINK-6161 > Project: Flink > Issue Type: Sub-task > Components: Distributed Coordination > Affects Versions: 1.3.0 > Reporter: Till Rohrmann > Assignee: zhijiang > Priority: Major > Labels: flip-6 > > The {{JobMaster}} should try reconnecting to the latest known resource > manager leader in case of a resource manager heartbeat timeout. Otherwise the > {{JobMaster}} will only try connecting to a {{ResourceManager}} if the leader > address information change. In case of a false positive heartbeat timeout, > this could break the {{JobMaster's}} connection to the {{ResourceManager}} > permanently. -- This message was sent by Atlassian JIRA (v7.6.3#76005)