[ 
https://issues.apache.org/jira/browse/MESOS-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone reassigned MESOS-8341:
---------------------------------

       Resolution: Fixed
         Assignee: Benno Evers
    Fix Version/s: 1.5.1

commit 3eb57cae3674fc835c784cac9eaa63e1aab7ba1c
Author: Benno Evers <bev...@mesosphere.com\>
Date:   Tue Jan 2 10:58:23 2018 -0800

    Correctly reset slave status when aborting a registration.
    
    Previously, the slave was not erased from the \`registering\`
    and \`reregistering\` sets in the master for some code paths
    that would result in a failed (re-)registration attempt.
    
    This could lead to a state where the reason of the unsuccessful
    (re-)registration attempt is fixed on the agent, but the master
    ignores subsequent attempts because it assumes the previous
    operation is still in progress.
    
    Review: https://reviews.apache.org/r/64506/

> Agent can become stuck in (re-)registering state during upgrades
> ----------------------------------------------------------------
>
>                 Key: MESOS-8341
>                 URL: https://issues.apache.org/jira/browse/MESOS-8341
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Benno Evers
>            Assignee: Benno Evers
>             Fix For: 1.5.1
>
>
> Currently, an agent will not be erased from the set of currently 
> (re-)registering agents if
>  - it tries to (re-)register with a malformed version string
>  - it tries to (re-)register with a version smaller than the minimum 
> supported version
>  - it tries to (re-)register with a domain when the master has no domain 
> configured
>  - the operator marks the slave as gone while the (re-)registration is ongoing
> Afterwards, all further (re-)registration attempts with the same agent id 
> will be discarded, because the master still  thinks that the original 
> (re-)registration is ongoing.
> Since most realistic way to encounter this issue would be during cluster 
> upgrades, and it will fix itself with a master restart, it is unlikely to be 
> reported externally.
> Review: https://reviews.apache.org/r/64506



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to