> On March 2, 2016, 10:43 p.m., Klaus Ma wrote: > > src/master/http.cpp, lines 1999-2000 > > <https://reviews.apache.org/r/44258/diff/1/?file=1276346#file1276346line1999> > > > > I think we can just remove `master->updateUnavailability(id, > > updated[id]);` here, so other machine will `UP` and scheduler will be > > updated in next loop. > > > > > > Futhur more, we can avoid the loop for update. The draft code will be: > > > > ``` > > hashmap<MachineID, Unavailability> updated; > > > > // delete the loop for "updated[id] = window.unavailability();" > > > > foreach (const mesos::maintenance::Window& window, schedule.windows()) { > > foreach (const MachineID& id, window.machine_ids()) { > > ... > > master->updateUnavailability(id, window.unavailability()); > > updated[id] = window.unavailability(); > > } > > } > > > > foreachkey (const MachineID& id, utils::copy(master->machines)) { > > if (updated.contains(id)) { > > continue; > > } > > > > // Transition each removed machine back to the `UP` mode and remove > > the > > // unavailability. > > master->machines[id].info.set_mode(MachineInfo::UP); > > master->updateUnavailability(id, None()); > > } > > ``` > > Joseph Wu wrote: > (Wish I could comment inline on a comment...) > > Your two loops won't update the `Unavailability` for machines that have > been brought DOWN previously. That's why we had > `master->updateUnavailability(id, updated[id]);` in two places.
@Joseph, I think `schedule.windows` will also include DOWN machines. We did check here: https://github.com/apache/mesos/blob/master/src/master/maintenance.cpp#L241 ; if the machine is DOWN but not in updated list, master'll reject the request. I have filed a JIRA (MESOS-4837) on whether this check is necessary. - Klaus ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/44258/#review121643 ----------------------------------------------------------- On March 3, 2016, 2:05 p.m., Guangya Liu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/44258/ > ----------------------------------------------------------- > > (Updated March 3, 2016, 2:05 p.m.) > > > Review request for mesos, Anand Mazumdar, Joris Van Remoortere, and Joseph Wu. > > > Bugs: MESOS-4831 > https://issues.apache.org/jira/browse/MESOS-4831 > > > Repository: mesos > > > Description > ------- > > There is a bug when setting host maintain with http endpoint: > https://github.com/apache/mesos/blob/master/src/master/http.cpp#L1987-L2021 > The logic is as this: > 1) Get all host list from maintain window and put it to updated hashmap. > 2) If the machine in was in updated was also in master->machines, call master > updateUnavailability to trigger recoverResources, updateUnavailability etc in > allocator > 3) Otherwise, clear the unavailabity time window for the machine. > 4) Update each new machines in updated to call master updateUnavailability > > But the logic in step 4) is getting all machines from the schedule windows > but not the machines that is new to the cluster, this caused master get two > updateUnavailability calls for a machine in the updated hashmap. > > The fix is filter machines in updated hashmap when handling new machines. > > > Diffs > ----- > > src/master/http.cpp 5e9e28e904ba0045ee27eb828f47231632a91d74 > src/tests/master_maintenance_tests.cpp > 3faa8136cf57276295553910319480028f433e4c > > Diff: https://reviews.apache.org/r/44258/diff/ > > > Testing > ------- > > make > make check > ./bin/mesos-tests.sh --gtest_filter="MasterMaintenanceTest.*" --verbose > > > Thanks, > > Guangya Liu > >