[ 
https://issues.apache.org/jira/browse/MESOS-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-6317:
-----------------------------------
    Summary: Race in master/allocator when updating oversubscribed resources of 
an agent.  (was: Race in master update slave.)

> Race in master/allocator when updating oversubscribed resources of an agent.
> ----------------------------------------------------------------------------
>
>                 Key: MESOS-6317
>                 URL: https://issues.apache.org/jira/browse/MESOS-6317
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Guangya Liu
>            Assignee: Guangya Liu
>             Fix For: 1.1.0
>
>
> Currently, when {{updateSlave}} in master, it will first rescind offers and 
> then updateSlave in allocator, but there is a race for this, there might be a 
> batch allocation inserted bwteen the two. In this case, the order will be 
> rescind offer -> batch allocation -> update slave. This order will cause some 
> issues when the oversubscribed resources was decreased.
> Suppose the oversubscribed resources was decreased from 2 to 1, then after 
> rescind offer finished, the batch allocation will allocate the old 2 
> oversubscribed resources again, then update slave will update the total 
> oversubscribed resources to 1. This will cause the agent host have some time 
> overcommitted due to the tasks can still use 2 oversubscribed resources but 
> not 1 oversubscribed resources, once the tasks using the 2 oversubscribed 
> resources finished, everything goes back.
> So here we should adjust the order of rescind offer and updateSlave in master 
> to avoid resource overcommit.
> If we update slave first then rescind offer, the order will be update slave 
> -> batch allocation -> rescind offer, this order will have no problem when 
> descreasing resources. Suppose the oversubscribed resources was decreased 
> from 2 to 1, then update slave will update total oversubscribed resources to 
> 1 directly, then the batch allocation will not allocate any oversubscribed 
> resources since there are more allocated than total oversubscribed resources, 
> then rescind offer will rescind all offers using oversubscribed resources. 
> This will not lead the agent host to be overcommitted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to