Neil Conway created MESOS-7681:
----------------------------------

             Summary: Add safeguard for new agents with new features + old 
master
                 Key: MESOS-7681
                 URL: https://issues.apache.org/jira/browse/MESOS-7681
             Project: Mesos
          Issue Type: Improvement
            Reporter: Neil Conway


Consider this scenario:

* Mesos cluster with 3 masters and 1 agent.
* 2 of the masters (including the leader) are upgraded to Mesos 1.4; remaining 
master stays at Mesos 1.3 (e.g., due to operator error).
* Agent is upgraded to Mesos 1.4
* Framework creates a reservation refinement on the agent
* Leading master fails; Mesos 1.3 master is elected as the new leader

In this scenario, the agent will send resources to the master in the new 
(post-refinement) format, but the master will not understand those new fields. 
This results in an inconsistency between the agent's resources and the master's 
view of the agent's resources. This could lead to various problems -- in 
effect, the reservation the framework previously made has been "forgotten" 
during master failover. Similarly, if the agent attempts to unreserve the 
resources (using the master's version of the resource), that operation will be 
rejected by the agent.

To fix this, it seems we need an explicit negotiation between the agent and the 
master as part of registration/re-registration. The agent would examine its 
resources and say which capabilities it _requires_ of the master; if the master 
does not support those resources, the agent cannot safely register. We could 
implement this either via master capabilities (agent computes the master 
capabilities it requires and declines to register if the master isn't new 
enough), or via agent capabilities (agent tells master the capabilities it is 
"actively using"; master refuses to allow any agent to register that is using a 
capability the master doesn't recognize/support). Probably the former is 
safer/cleaner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to