[ https://issues.apache.org/jira/browse/MESOS-8058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexander Rukletsov updated MESOS-8058: --------------------------------------- Summary: Agent and master can race when updating agent state. (was: Agent and master can race when updating agent state) > Agent and master can race when updating agent state. > ---------------------------------------------------- > > Key: MESOS-8058 > URL: https://issues.apache.org/jira/browse/MESOS-8058 > Project: Mesos > Issue Type: Bug > Components: agent > Affects Versions: 1.5.0 > Reporter: Benjamin Bannier > Assignee: Benjamin Bannier > Priority: Critical > Labels: mesosphere > Fix For: 1.5.0 > > > In {{2af9a5b07dc80151154264e974d03f56a1c25838}} we introduce the use of > {{UpdateSlaveMessage}} for the agent to inform the master about its current > total resources. Currently we trigger this message only on agent registration > and reregistration. > This can race with operations applied in the master and communicated via > {{CheckpointResourcesMessage}}. > Example: > 1. Agent ({{cpus:4(\*)}} registers. > 2. Master is triggered to apply an operation to the agent's resources, e.g., > a reservation: {{cpus:4(\*) -> cpus:4(A)}}. The master applies the operation > to its current view of the agent's resources and sends the agent a > {{CheckpointResourcesMessage}} so the agent can persist the result. > 3. The agent sends the master an {{UpdateSlaveMessage}}, e.g., {{cpus:4(\*)}} > since it hasn't received the {{CheckpointResourcesMessage}} yet. > 4. The master processes the {{UpdateSlaveMessage}} and updates its view of > the agent's resources to be {{cpus:4(\*)}}. > 5. The agent processes the {{CheckpointResourcesMessage}} and updates its > view of its resources to be {{cpus:4(A)}}. > 6. The agent and the master have an inconsistent view of the agent's > resources. -- This message was sent by Atlassian JIRA (v6.4.14#64029)