> On March 2, 2016, 6:43 a.m., Klaus Ma wrote:
> > src/master/http.cpp, lines 1999-2000
> > <https://reviews.apache.org/r/44258/diff/1/?file=1276346#file1276346line1999>
> >
> >     I think we can just remove `master->updateUnavailability(id, 
> > updated[id]);` here, so other machine will `UP` and scheduler will be 
> > updated in next loop.
> >     
> >     
> >     Futhur more, we can avoid the loop for update. The draft code will be:
> >     
> >     ```
> >     hashmap<MachineID, Unavailability> updated;
> >     
> >     // delete the loop for "updated[id] = window.unavailability();"
> >     
> >     foreach (const mesos::maintenance::Window& window, schedule.windows()) {
> >       foreach (const MachineID& id, window.machine_ids()) {
> >         ...
> >         master->updateUnavailability(id, window.unavailability());
> >         updated[id] = window.unavailability();
> >       }
> >     }
> >     
> >     foreachkey (const MachineID& id, utils::copy(master->machines)) {
> >       if (updated.contains(id)) {
> >         continue;
> >       }
> >     
> >       // Transition each removed machine back to the `UP` mode and remove 
> > the
> >       // unavailability.
> >       master->machines[id].info.set_mode(MachineInfo::UP);
> >       master->updateUnavailability(id, None());
> >     }
> >     ```

(Wish I could comment inline on a comment...)

Your two loops won't update the `Unavailability` for machines that have been 
brought DOWN previously.  That's why we had `master->updateUnavailability(id, 
updated[id]);` in two places.


- Joseph


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44258/#review121643
-----------------------------------------------------------


On March 1, 2016, 11:45 p.m., Guangya Liu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44258/
> -----------------------------------------------------------
> 
> (Updated March 1, 2016, 11:45 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar, Joris Van Remoortere, and Joseph Wu.
> 
> 
> Bugs: MESOS-4831
>     https://issues.apache.org/jira/browse/MESOS-4831
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> There is a bug when setting host maintain with http endpoint: 
> https://github.com/apache/mesos/blob/master/src/master/http.cpp#L1987-L2021
> The logic is as this:
> 1) Get all host list from maintain window and put it to updated hashmap.
> 2) If the machine in was in updated was also in master->machines, call master 
> updateUnavailability to trigger recoverResources, updateUnavailability etc in 
> allocator
> 3) Otherwise, clear the unavailabity time window for the machine.
> 4) Update each new machines in updated to call master updateUnavailability
> 
> But the logic in step 4) is getting all machines from the schedule windows 
> but not the machines that is new to the cluster, this caused master get two 
> updateUnavailability calls for a machine in the updated hashmap.
> 
> The fix is filter machines in updated hashmap when handling new machines.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp 5e9e28e904ba0045ee27eb828f47231632a91d74 
>   src/tests/master_maintenance_tests.cpp 
> 3faa8136cf57276295553910319480028f433e4c 
> 
> Diff: https://reviews.apache.org/r/44258/diff/
> 
> 
> Testing
> -------
> 
> make
> make check
>  ./bin/mesos-tests.sh --gtest_filter="MasterMaintenanceTest.*" --verbose
> 
> 
> Thanks,
> 
> Guangya Liu
> 
>

Reply via email to