Re: Review Request 70325: Updated the master to allocate recovered orphan operation resources.

Greg Mann Wed, 27 Mar 2019 16:55:36 -0700

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70325/#review214141
-----------------------------------------------------------





src/master/master.cpp
Lines 10538-10541 (patched)
<https://reviews.apache.org/r/70325/#comment300321>

    Looking at this again, I guess I should build up a `hashmap<SlaveID, 
std::pair<Resources, Resources>>` and make just one `addAgentResources()` call 
per agent.


- Greg Mann


On March 27, 2019, 7:59 p.m., Greg Mann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70325/
> -----------------------------------------------------------
> 
> (Updated March 27, 2019, 7:59 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Gastón Kleiman, Joseph Wu, and 
> Meng Zhu.
> 
> 
> Bugs: MESOS-9635
>     https://issues.apache.org/jira/browse/MESOS-9635
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch updates the master's framework recovery code to use
> the allocator's `addAgentResources()` method rather than
> `updateSlave()` when recovering orphan operations, which has the
> benefit of tracking the allocation of the operations' consumed
> resources, avoiding situations in which those resources would be
> incorrectly offered to frameworks while the operation is still
> in a pending state.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp acc67d3763ddee9027e6cf375f1d495ff5805026 
> 
> 
> Diff: https://reviews.apache.org/r/70325/diff/1/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> To verify the flaky test fix, the following command was executed both before 
> and after the patches were applied, while `stress -c <num_cores_on_machine>` 
> was being run:
> `bin/mesos-tests.sh 
> --gtest_filter="*AgentPendingOperationAfterMasterFailover*" --gtest_repeat=-1 
> --gtest_break_on_failure`
> 
> Before the patches were applied, the test would reliably fail after less than 
> 50 repetitions. After the patches are applied, the test can be run for 
> hundreds of repetitions with no failures.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>

Re: Review Request 70325: Updated the master to allocate recovered orphan operation resources.

Reply via email to