-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70325/#review214350
-----------------------------------------------------------


Fix it, then Ship it!




Looks good.

The description needs to be updated regarding the `addAgentResources()`


src/master/master.cpp
Lines 10548-10549 (original), 10549-10552 (patched)
<https://reviews.apache.org/r/70325/#comment300542>

    Lets just say:
    
    // This call to `addResourceProvider()` adds orphan operations 
    // resources back to the agent's total and used resources.
    // This prevents the resources from being offered while the
    // operation is still pending.
    
    > updates the role's allocation
    
    This does not happen with `addResourceProvider()`, it happens in 
`addFramework()`.


- Meng Zhu


On April 3, 2019, 5:10 p.m., Greg Mann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70325/
> -----------------------------------------------------------
> 
> (Updated April 3, 2019, 5:10 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Gastón Kleiman, Joseph Wu, and 
> Meng Zhu.
> 
> 
> Bugs: MESOS-9635
>     https://issues.apache.org/jira/browse/MESOS-9635
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch updates the master's framework recovery code to use
> the allocator's `addAgentResources()` method rather than
> `updateSlave()` when recovering orphan operations, which has the
> benefit of tracking the allocation of the operations' consumed
> resources, avoiding situations in which those resources would be
> incorrectly offered to frameworks while the operation is still
> in a pending state.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp cf5caa0893ba1387a1f3a9d129ecd7d974f776bd 
> 
> 
> Diff: https://reviews.apache.org/r/70325/diff/2/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> To verify the flaky test fix, the following command was executed both before 
> and after the patches were applied, while `stress -c <num_cores_on_machine>` 
> was being run:
> `bin/mesos-tests.sh 
> --gtest_filter="*AgentPendingOperationAfterMasterFailover*" --gtest_repeat=-1 
> --gtest_break_on_failure`
> 
> Before the patches were applied, the test would reliably fail after less than 
> 50 repetitions. After the patches are applied, the test can be run for 
> hundreds of repetitions with no failures.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>

Reply via email to