-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52465/
-----------------------------------------------------------

(Updated 十月 6, 2016, 2:23 a.m.)


Review request for mesos and Benjamin Mahler.


Bugs: MESOS-6317
    https://issues.apache.org/jira/browse/MESOS-6317


Repository: mesos


Description (updated)
-------

The reason that we need `updateSlave` first and then rescind offer
is because of a race condition case: there may be a batch allocation
triggered between rescind offer and `updateSlave`. In this case, the
order will be rescind offer -> batch allocation -> update slave. This
order will cause some issues when the oversubscribed resources was
decreased.

Suppose the oversubscribed resources was decreased from 2 to 1, then
after rescind offer finished, the batch allocation will allocate the
old 2 oversubscribed resources again, then update slave will update
the total oversubscribed resources to 1. This will cause the agent
host have some time overcommitted due to the tasks can still use 2
oversubscribed resources but not 1 oversubscribed resources, once
the tasks using the 2 oversubscribed resources finished, everything
goes back.

If we update slave first then rescind offer, the order will be update
slave -> batch allocation -> rescind offer, this order will have no
problem when shrinking resources. Suppose the oversubscribed resources
was shrinked from 2 to 1, then update slave will update total
oversubscribed resources to 1 directly, then the batch allocation will
not allocate any oversubscribed resources since there are more
allocated than total oversubscribed resources, then rescind offer
will rescind all offers using oversubscribed resources. This will
not lead the agent host to be overcommitted.


Diffs (updated)
-----

  src/master/master.cpp 02a2fb29bdd8484fc90e5cb033ac29b49a141860 
  src/tests/oversubscription_tests.cpp 3dd34ea78ac795a6b0d342dcae86642c51841eea 

Diff: https://reviews.apache.org/r/52465/diff/


Testing (updated)
-------

make
make check

```
./bin/mesos-tests.sh  
--gtest_filter="OversubscriptionTest.RescindRevocableOffer*" --gtest_repeat=20
```


Thanks,

Guangya Liu

Reply via email to