----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71646/#review218327 -----------------------------------------------------------
Thanks Andrei! Nice work. The Testing Done section doesn't make its way into the commit message, but I find it helpful for the commit message to contain a concise summary of the performance improvements that this patch provides for posterity, e.g. https://github.com/apache/mesos/commit/de90b2b3078e06975ab2cccc061db821cfe7dda8 src/master/allocator/mesos/sorter/drf/sorter.cpp Lines 456-458 (original), 456-458 (patched) <https://reviews.apache.org/r/71646/#comment305985> I'm a little puzzled about this case, do we still need to treat empty resources as a special case? I'm not sure why it's there in the first place and it seems simpler without the special casing? I assume this is what made double removal not an error src/master/allocator/mesos/sorter/drf/sorter.cpp Line 464 (original), 460-463 (patched) <https://reviews.apache.org/r/71646/#comment305984> How about s/const auto insertionResult/bool inserted/ ? src/master/allocator/mesos/sorter/drf/sorter.cpp Line 483 (original), 477 (patched) <https://reviews.apache.org/r/71646/#comment305986> how about s/resourcesIter/agent/ ? src/master/allocator/mesos/sorter/random/sorter.cpp Lines 390-423 (original), 390-410 (patched) <https://reviews.apache.org/r/71646/#comment305987> Ditto comments here src/master/allocator/mesos/sorter/sorter.hpp Lines 126-131 (original), 126-134 (patched) <https://reviews.apache.org/r/71646/#comment305983> If double insertion fails, let's have double removal fail? (rather than silently a no-op) This means the internal special case logic for empty resources needs to go away? Perhaps a note about how adding / removing the entire agent is cheaper than the old code of updating an agent's resources when they change? That seems non-obvious. (i.e. if reservations get made, have to remove and add agent, there's no ability to update the total). src/tests/sorter_tests.cpp Lines 514-515 (original), 514-515 (patched) <https://reviews.apache.org/r/71646/#comment305988> Is this totalResources variable needed? Maybe just inline all of these to simplify? ``` sorter.addSlave(slaveId, *Resources::parse("cpus:100;mem:100")); ``` Seems more readable to me. Ditto for the other tests. src/tests/sorter_tests.cpp Line 1656 (original), 1655 (patched) <https://reviews.apache.org/r/71646/#comment305989> s/consider/Consider/ Can this TODO expand on why? src/tests/sorter_tests.cpp Lines 1831-1832 (original) <https://reviews.apache.org/r/71646/#comment305990> Maybe the note on addSlave/removeSlave needs to also mention that the immutable approach (add/remove and no update) doesn't have to deal with the shared resource accounting trickyness that update has to deal with? - Benjamin Mahler On Oct. 22, 2019, 1:41 p.m., Andrei Sekretenko wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/71646/ > ----------------------------------------------------------- > > (Updated Oct. 22, 2019, 1:41 p.m.) > > > Review request for mesos, Benjamin Mahler and Meng Zhu. > > > Bugs: MESOS-10015 > https://issues.apache.org/jira/browse/MESOS-10015 > > > Repository: mesos > > > Description > ------- > > This patch replaces Sorter methods which modify `Resources` of an agent > in the Sorter with methods which add/remove an agent as a whole > (which was actually the only use case of the old methods). > > Thus, subtracting/adding `Resources` of the whole agent no longer > occurs when updating resources of the agent in the Sorter. This > mitigates the issue with poor performance of > `HierarchicalAllocatorProcess::updateAllocation()` for agents with > a huge number of non-addable resources (see MESOS-10015). > > > Diffs > ----- > > src/master/allocator/mesos/hierarchical.cpp > 21010de363f25c516bb031e4ae48888e53621128 > src/master/allocator/mesos/sorter/drf/sorter.hpp > 3f6c7413f1b76f3fa86388360983763c8b76079f > src/master/allocator/mesos/sorter/drf/sorter.cpp > ef79083b710fba628b4a7e93f903883899f8a71b > src/master/allocator/mesos/sorter/random/sorter.hpp > a3097be98d175d2b47714eb8b70b1ce8c5c2bba8 > src/master/allocator/mesos/sorter/random/sorter.cpp > 86aeb1b8136eaffd2d52d3b603636b01383a9024 > src/master/allocator/mesos/sorter/sorter.hpp > 6b6b4a1811ba36e0212de17b9a6e63a6f8678a7f > src/tests/sorter_tests.cpp d7fdee8f2cab4c930230750f0bd1a55eb08f89bb > > > Diff: https://reviews.apache.org/r/71646/diff/1/ > > > Testing > ------- > > **make check** > > **Variant of > `ReservationParam/HierarchicalAllocator__BENCHMARK_WithReservationParam`** > from https://reviews.apache.org/r/71639/ (work in progress) > shows significant improvement and change from O(number_of_roles^3) to > O(number_of_roles^2): > **Before**: > Agent resources size: 200 (50 roles, 1 reservations per role, 1 port ranges) > Made 20 reserve and unreserve operations in 2.08586secs > Average UNRESERVE duration: 51.491561ms > Average RESERVE duration: 52.801438ms > > Agent resources size: 400 (100 roles, 1 reservations per role, 1 port ranges) > Made 20 reserve and unreserve operations in 13.8449005secs > Average UNRESERVE duration: 347.624639ms > Average RESERVE duration: 344.620385ms > > Agent resources size: 800 (200 roles, 1 reservations per role, 1 port ranges) > Made 20 reserve and unreserve operations in 2.19253121188333mins > Average UNRESERVE duration: 3.285422441secs > Average RESERVE duration: 3.292171194secs > > Agent resources size: 1600 (400 roles, 1 reservations per role, 1 port ranges) > (killed after several minutes) > > **After:** > > Agent resources size: 200 (50 roles, 1 reservations per role, 1 port ranges) > Made 20 reserve and unreserve operations in 559.381603ms > Average UNRESERVE duration: 14.221942ms > Average RESERVE duration: 13.747137ms > > Agent resources size: 400 (100 roles, 1 reservations per role, 1 port ranges) > Made 20 reserve and unreserve operations in 1.186726648secs > Average UNRESERVE duration: 29.569935ms > Average RESERVE duration: 29.766396ms > > Agent resources size: 800 (200 roles, 1 reservations per role, 1 port ranges) > Made 20 reserve and unreserve operations in 3.269639813secs > Average UNRESERVE duration: 82.652908ms > Average RESERVE duration: 80.829081ms > > Agent resources size: 1600 (400 roles, 1 reservations per role, 1 port ranges) > Made 20 reserve and unreserve operations in 11.156949417secs > Average UNRESERVE duration: 278.158063ms > Average RESERVE duration: 279.689407ms > > Agent resources size: 3200 (800 roles, 1 reservations per role, 1 port ranges) > Made 20 reserve and unreserve operations in 44.796308954secs > Average UNRESERVE duration: 1.114141457secs > Average RESERVE duration: 1.12567399secs > > **No significant performnce changes in > `QuotaParam/BENCHMARK_HierarchicalAllocator_WithQuotaParam.LargeAndSmallQuota`.** > > **Before:** > > Added 30 agents in 1.175593ms > Added 30 frameworks in 6.829173ms > Benchmark setup: 30 agents, 30 roles, 30 frameworks, with drf sorter > Made 36 allocations in 8.294832ms > Made 0 allocation in 3.674923ms > > Added 300 agents in 7.860046ms > Added 300 frameworks in 149.743858ms > Benchmark setup: 300 agents, 300 roles, 300 frameworks, with drf sorter > Made 350 allocations in 132.796102ms > Made 0 allocation in 107.887758ms > > Added 3000 agents in 36.944587ms > Added 3000 frameworks in 10.688501403secs > Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with drf sorter > Made 3500 allocations in 12.6020582secs > Made 0 allocation in 9.716229696secs > > Added 30 agents in 1.010362ms > Added 30 frameworks in 6.272027ms > Benchmark setup: 30 agents, 30 roles, 30 frameworks, with random sorter > Made 38 allocations in 9.119976ms > Made 0 allocation in 5.460369ms > > Added 300 agents in 7.442897ms > Added 300 frameworks in 152.016597ms > Benchmark setup: 300 agents, 300 roles, 300 frameworks, with random sorter > Made 391 allocations in 195.242282ms > Made 0 allocation in 139.638551ms > > Added 3000 agents in 36.003028ms > Added 3000 frameworks in 11.203697649secs > Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with random sorter > Made 3856 allocations in 17.807913455secs > Made 0 allocation in 13.524946653secs > > **After:** > > Added 30 agents in 1.196576ms > Added 30 frameworks in 6.814792ms > Benchmark setup: 30 agents, 30 roles, 30 frameworks, with drf sorter > Made 36 allocations in 8.263036ms > Made 0 allocation in 3.947283ms > > Added 300 agents in 8.497121ms > Added 300 frameworks in 156.578165ms > Benchmark setup: 300 agents, 300 roles, 300 frameworks, with drf sorter > Made 350 allocations in 168.745307ms > Made 0 allocation in 95.505069ms > > Added 3000 agents in 38.074525ms > Added 3000 frameworks in 11.249150205secs > Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with drf sorter > Made 3500 allocations in 12.772526049secs > Made 0 allocation in 10.132801781secs > > Added 30 agents in 799844ns > Added 30 frameworks in 5.8663ms > Benchmark setup: 30 agents, 30 roles, 30 frameworks, with random sorter > Made 38 allocations in 9.612524ms > Made 0 allocation in 5.150924ms > > Added 300 agents in 5.560583ms > Added 300 frameworks in 138.469712ms > Benchmark setup: 300 agents, 300 roles, 300 frameworks, with random sorter > Made 391 allocations in 175.021255ms > Made 0 allocation in 138.181869ms > > Added 3000 agents in 42.921689ms > Added 3000 frameworks in 10.825018278secs > Benchmark setup: 3000 agents, 3000 roles, 3000 frameworks, with random sorter > Made 3856 allocations in 15.29232742secs > Made 0 allocation in 14.202057473secs > > > Thanks, > > Andrei Sekretenko > >
