----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/52465/#review151127 -----------------------------------------------------------
Patch looks great! Reviews applied: [52465] Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker_build.sh - Mesos ReviewBot On Oct. 1, 2016, 9:34 a.m., Guangya Liu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/52465/ > ----------------------------------------------------------- > > (Updated Oct. 1, 2016, 9:34 a.m.) > > > Review request for mesos and Benjamin Mahler. > > > Repository: mesos > > > Description > ------- > > The reason that we need `updateSlave` first and then rescind offer > is because of a race condition case: there may be a batch allocation > triggered between rescind offer and `updateSlave`. In this case, the > order will be rescind offer -> batch allocation -> update slave. This > order will cause some issues when the oversubscribed resources was > shrinked. > > Suppose the oversubscribed resources was shrinked from 2 to 1, then > after rescind offer finished, the batch allocation will allocate the > old 2 oversubscribed resources again, then update slave will update > the total oversubscribed resources to 1. This will cause the agent > host have some time overcommitted due to the tasks can still use 2 > oversubscribed resources but not 1 oversubscribed resources, once > the tasks using the 2 oversubscribed resources finished, everything > goes back. > > If we update slave first then rescind offer, the order will be update > slave -> batch allocation -> rescind offer, this order will have no > problem when shrinking resources. Suppose the oversubscribed resources > was shrinked from 2 to 1, then update slave will update total > oversubscribed resources to 1 directly, then the batch allocation will > not allocate any oversubscribed resources since there are more > allocated than total oversubscribed resources, then rescind offer > will rescind all offers using oversubscribed resources. This will > not lead the agent host to be overcommitted. > > > Diffs > ----- > > src/master/master.cpp c83ee2f9fa05372748ff5056229fbe2bf06bfabb > src/tests/oversubscription_tests.cpp > 3dd34ea78ac795a6b0d342dcae86642c51841eea > > Diff: https://reviews.apache.org/r/52465/diff/ > > > Testing > ------- > > make > make check > > ``` > GLOG_v=1 ./bin/mesos-tests.sh > --gtest_filter="OversubscriptionTest.RescindRevocableOffer*" --verbose > ``` > > > Thanks, > > Guangya Liu > >