[ https://issues.apache.org/jira/browse/MESOS-6676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexander Rukletsov updated MESOS-6676: --------------------------------------- Fix Version/s: 1.1.1 Summary: Always re-link with scheduler during re-registration. (was: Always re-link with scheduler during re-registration) > Always re-link with scheduler during re-registration. > ----------------------------------------------------- > > Key: MESOS-6676 > URL: https://issues.apache.org/jira/browse/MESOS-6676 > Project: Mesos > Issue Type: Bug > Components: master > Reporter: Neil Conway > Assignee: Neil Conway > Labels: mesosphere > Fix For: 1.1.1, 1.2.0 > > > Scenario: > # Framework registers with master using a non-zero {{failover_timeout}} and > is assigned a FrameworkID. > # The master sees an {{ExitedEvent}} for the master->scheduler link. This > could happen due to some transient network error, e.g., 1-way partition. The > master sends a {{FrameworkErrorMessage}} to the framework. The master marks > the framework as disconnected, but keeps the {{Framework*}} for it around in > {{frameworks.registered}}. > # The framework doesn't receive the {{FrameworkErrorMessage}} because it is > dropped by the network. > # The scheduler might receive an {{ExitedEvent}} for the scheduler -> master > link, but it ignores this anyway (see MESOS-887). > # The scheduler sees a new-master-detected event and re-registers with the > master. It doesn _not_ set the {{force}} flag. This means we follow [this > code > path|https://github.com/apache/mesos/blob/a6bab9015cd63121081495b8291635f386b95a92/src/master/master.cpp#L2771] > in the master, which does _not_ relink with the scheduler. > The result is that scheduler re-registration succeds, but the master -> > scheduler link is never re-established. -- This message was sent by Atlassian JIRA (v6.3.4#6332)