----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/52465/#review151850 -----------------------------------------------------------
Ship it! Just some minor things I noticed as I was committing the patch, I'll take care of them. src/master/master.cpp (line 5554) <https://reviews.apache.org/r/52465/#comment220451> This becomes "first". src/master/master.cpp (line 5557) <https://reviews.apache.org/r/52465/#comment220450> This is no longer "first". - Benjamin Mahler On Oct. 6, 2016, 2:23 a.m., Guangya Liu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/52465/ > ----------------------------------------------------------- > > (Updated Oct. 6, 2016, 2:23 a.m.) > > > Review request for mesos and Benjamin Mahler. > > > Bugs: MESOS-6317 > https://issues.apache.org/jira/browse/MESOS-6317 > > > Repository: mesos > > > Description > ------- > > The reason that we need `updateSlave` first and then rescind offer > is because of a race condition case: there may be a batch allocation > triggered between rescind offer and `updateSlave`. In this case, the > order will be rescind offer -> batch allocation -> update slave. This > order will cause some issues when the oversubscribed resources was > decreased. > > Suppose the oversubscribed resources was decreased from 2 to 1, then > after rescind offer finished, the batch allocation will allocate the > old 2 oversubscribed resources again, then update slave will update > the total oversubscribed resources to 1. This will cause the agent > host have some time overcommitted due to the tasks can still use 2 > oversubscribed resources but not 1 oversubscribed resources, once > the tasks using the 2 oversubscribed resources finished, everything > goes back. > > If we update slave first then rescind offer, the order will be update > slave -> batch allocation -> rescind offer, this order will have no > problem when shrinking resources. Suppose the oversubscribed resources > was shrinked from 2 to 1, then update slave will update total > oversubscribed resources to 1 directly, then the batch allocation will > not allocate any oversubscribed resources since there are more > allocated than total oversubscribed resources, then rescind offer > will rescind all offers using oversubscribed resources. This will > not lead the agent host to be overcommitted. > > > Diffs > ----- > > src/master/master.cpp 02a2fb29bdd8484fc90e5cb033ac29b49a141860 > src/tests/oversubscription_tests.cpp > 3dd34ea78ac795a6b0d342dcae86642c51841eea > > Diff: https://reviews.apache.org/r/52465/diff/ > > > Testing > ------- > > make > make check > > ``` > ./bin/mesos-tests.sh > --gtest_filter="OversubscriptionTest.RescindRevocableOffer*" --gtest_repeat=20 > ``` > > > Thanks, > > Guangya Liu > >