[jira] [Updated] (MESOS-6676) Always re-link with scheduler during re-registration.

2017-01-16 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-6676:
--
Fix Version/s: 1.0.3

> Always re-link with scheduler during re-registration.
> -
>
> Key: MESOS-6676
> URL: https://issues.apache.org/jira/browse/MESOS-6676
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
> Fix For: 1.1.1, 1.2.0, 1.0.3
>
>
> Scenario:
> # Framework registers with master using a non-zero {{failover_timeout}} and 
> is assigned a FrameworkID.
> # The master sees an {{ExitedEvent}} for the master->scheduler link. This 
> could happen due to some transient network error, e.g., 1-way partition. The 
> master sends a {{FrameworkErrorMessage}} to the framework. The master marks 
> the framework as disconnected, but keeps the {{Framework*}} for it around in 
> {{frameworks.registered}}.
> # The framework doesn't receive the {{FrameworkErrorMessage}} because it is 
> dropped by the network.
> # The scheduler might receive an {{ExitedEvent}} for the scheduler -> master 
> link, but it ignores this anyway (see MESOS-887).
> # The scheduler sees a new-master-detected event and re-registers with the 
> master. It doesn _not_ set the {{force}} flag. This means we follow [this 
> code 
> path|https://github.com/apache/mesos/blob/a6bab9015cd63121081495b8291635f386b95a92/src/master/master.cpp#L2771]
>  in the master, which does _not_ relink with the scheduler.
> The result is that scheduler re-registration succeds, but the master -> 
> scheduler link is never re-established.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6676) Always re-link with scheduler during re-registration.

2016-12-22 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6676:
---
Fix Version/s: 1.1.1
  Summary: Always re-link with scheduler during re-registration.  (was: 
Always re-link with scheduler during re-registration)

> Always re-link with scheduler during re-registration.
> -
>
> Key: MESOS-6676
> URL: https://issues.apache.org/jira/browse/MESOS-6676
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
> Fix For: 1.1.1, 1.2.0
>
>
> Scenario:
> # Framework registers with master using a non-zero {{failover_timeout}} and 
> is assigned a FrameworkID.
> # The master sees an {{ExitedEvent}} for the master->scheduler link. This 
> could happen due to some transient network error, e.g., 1-way partition. The 
> master sends a {{FrameworkErrorMessage}} to the framework. The master marks 
> the framework as disconnected, but keeps the {{Framework*}} for it around in 
> {{frameworks.registered}}.
> # The framework doesn't receive the {{FrameworkErrorMessage}} because it is 
> dropped by the network.
> # The scheduler might receive an {{ExitedEvent}} for the scheduler -> master 
> link, but it ignores this anyway (see MESOS-887).
> # The scheduler sees a new-master-detected event and re-registers with the 
> master. It doesn _not_ set the {{force}} flag. This means we follow [this 
> code 
> path|https://github.com/apache/mesos/blob/a6bab9015cd63121081495b8291635f386b95a92/src/master/master.cpp#L2771]
>  in the master, which does _not_ relink with the scheduler.
> The result is that scheduler re-registration succeds, but the master -> 
> scheduler link is never re-established.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6676) Always re-link with scheduler during re-registration

2016-12-08 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-6676:
---
Target Version/s: 1.1.1, 1.2.0, 1.0.3

> Always re-link with scheduler during re-registration
> 
>
> Key: MESOS-6676
> URL: https://issues.apache.org/jira/browse/MESOS-6676
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>
> Scenario:
> # Framework registers with master using a non-zero {{failover_timeout}} and 
> is assigned a FrameworkID.
> # The master sees an {{ExitedEvent}} for the master->scheduler link. This 
> could happen due to some transient network error, e.g., 1-way partition. The 
> master sends a {{FrameworkErrorMessage}} to the framework. The master marks 
> the framework as disconnected, but keeps the {{Framework*}} for it around in 
> {{frameworks.registered}}.
> # The framework doesn't receive the {{FrameworkErrorMessage}} because it is 
> dropped by the network.
> # The scheduler might receive an {{ExitedEvent}} for the scheduler -> master 
> link, but it ignores this anyway (see MESOS-887).
> # The scheduler sees a new-master-detected event and re-registers with the 
> master. It doesn _not_ set the {{force}} flag. This means we follow [this 
> code 
> path|https://github.com/apache/mesos/blob/a6bab9015cd63121081495b8291635f386b95a92/src/master/master.cpp#L2771]
>  in the master, which does _not_ relink with the scheduler.
> The result is that scheduler re-registration succeds, but the master -> 
> scheduler link is never re-established.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6676) Always re-link with scheduler during re-registration

2016-12-07 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-6676:
---
Shepherd: Vinod Kone

> Always re-link with scheduler during re-registration
> 
>
> Key: MESOS-6676
> URL: https://issues.apache.org/jira/browse/MESOS-6676
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>
> Scenario:
> # Framework registers with master using a non-zero {{failover_timeout}} and 
> is assigned a FrameworkID.
> # The master sees an {{ExitedEvent}} for the master->scheduler link. This 
> could happen due to some transient network error, e.g., 1-way partition. The 
> master sends a {{FrameworkErrorMessage}} to the framework. The master marks 
> the framework as disconnected, but keeps the {{Framework*}} for it around in 
> {{frameworks.registered}}.
> # The framework doesn't receive the {{FrameworkErrorMessage}} because it is 
> dropped by the network.
> # The scheduler might receive an {{ExitedEvent}} for the scheduler -> master 
> link, but it ignores this anyway (see MESOS-887).
> # The scheduler sees a new-master-detected event and re-registers with the 
> master. It doesn _not_ set the {{force}} flag. This means we follow [this 
> code 
> path|https://github.com/apache/mesos/blob/a6bab9015cd63121081495b8291635f386b95a92/src/master/master.cpp#L2771]
>  in the master, which does _not_ relink with the scheduler.
> The result is that scheduler re-registration succeds, but the master -> 
> scheduler link is never re-established.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6676) Always re-link with scheduler during re-registration

2016-12-02 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-6676:
---
Assignee: (was: Neil Conway)

> Always re-link with scheduler during re-registration
> 
>
> Key: MESOS-6676
> URL: https://issues.apache.org/jira/browse/MESOS-6676
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Neil Conway
>  Labels: mesosphere
>
> Scenario:
> # Framework registers with master using a non-zero {{failover_timeout}} and 
> is assigned a FrameworkID.
> # The master sees an {{ExitedEvent}} for the master->scheduler link. This 
> could happen due to some transient network error, e.g., 1-way partition. The 
> master sends a {{FrameworkErrorMessage}} to the framework. The master marks 
> the framework as disconnected, but keeps the {{Framework*}} for it around in 
> {{frameworks.registered}}.
> # The framework doesn't receive the {{FrameworkErrorMessage}} because it is 
> dropped by the network.
> # The scheduler might receive an {{ExitedEvent}} for the scheduler -> master 
> link, but it ignores this anyway (see MESOS-887).
> # The scheduler sees a new-master-detected event and re-registers with the 
> master. It doesn _not_ set the {{force}} flag. This means we follow [this 
> code 
> path|https://github.com/apache/mesos/blob/a6bab9015cd63121081495b8291635f386b95a92/src/master/master.cpp#L2771]
>  in the master, which does _not_ relink with the scheduler.
> The result is that scheduler re-registration succeds, but the master -> 
> scheduler link is never re-established.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)