----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/70283/#review213938 -----------------------------------------------------------
As pointed out by Greg earlier, this patch violates the comment here: https://github.com/apache/mesos/blob/4580834471fb3bc0b95e2b96e04a63d34faef724/src/master/allocator/mesos/hierarchical.cpp#L769-L770 While I think the current code seems to work with this patch (except the comment mentioned above), I think we should check with @Bbannier. In particular, the `TODO` here https://github.com/apache/mesos/blob/4580834471fb3bc0b95e2b96e04a63d34faef724/src/master/master.cpp#L8330-L8334: ``` // TODO(bbannier): Consider introducing ways of making sure an agent // always knows the `FrameworkInfo` of operations triggered on its // resources, e.g., by adding an explicit `FrameworkInfo` to // operations like is already done for `RunTaskMessage`, see // MESOS-8582. ``` Preferably, it will be great if we can fix the above TODO and ensure frameworkInfo is always available. src/master/master.cpp Line 8384 (original), 8371 (patched) <https://reviews.apache.org/r/70283/#comment300050> can we separate out irrelevant changes? - Meng Zhu On March 22, 2019, 1:37 p.m., Gastón Kleiman wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/70283/ > ----------------------------------------------------------- > > (Updated March 22, 2019, 1:37 p.m.) > > > Review request for mesos, Benjamin Bannier, Greg Mann, Joseph Wu, and Meng > Zhu. > > > Bugs: MESOS-9635 > https://issues.apache.org/jira/browse/MESOS-9635 > > > Repository: mesos > > > Description > ------- > > This patch makes the master's `UpdateSlaveMessage` handler include > resources consumed by orphan operations when calling > `allocator->addResourceProvider()`. > > The change prevents some races that lead to the master reoffering the > resources consumed by the operations and makes the > `OperationReconciliationTest.AgentPendingOperationAfterMasterFailover` > test stable. > > > Diffs > ----- > > src/master/master.cpp 9c4a9e83da94535873d72c902835f229c4f96320 > > > Diff: https://reviews.apache.org/r/70283/diff/1/ > > > Testing > ------- > > `OperationReconciliationTest.AgentPendingOperationAfterMasterFailover` passed > over 5000 iterations under stress. Other tests still pass on GNU/Linux. > > > Thanks, > > Gastón Kleiman > >