----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68132/ -----------------------------------------------------------
(Updated Aug. 11, 2018, 6:09 p.m.) Review request for mesos, Benno Evers and Benjamin Mahler. Bugs: MESOS-9122 https://issues.apache.org/jira/browse/MESOS-9122 Repository: mesos Description ------- With this patch handlers for '/state' requests are not scheduled directly after authorization, but are accumulated and then scheduled for later parallel processing. This approach allows, if there are N '/state' requests in the Master's mailbox and T is the request response time, to block the Master actor only once for time O(T) instead of blocking it for time N*T prior to this patch. This batching technique reduces both the time Master is spending answering '/state' requests and the average request response time in presence of multiple requests in the Master's mailbox. However, for seldom '/state' requests that don't accumulate in the Master's mailbox, the response time might increase due to an added trip through the mailbox. The change preserves the read-your-writes consistency model. Diffs ----- src/master/http.cpp d43fbd689598612ec5946b46e2fa5e7f5e22cfa8 src/master/master.hpp 209b998db8d2bad7a3812df44f0939458f48eb11 Diff: https://reviews.apache.org/r/68132/diff/2/ Testing ------- `make check` on Mac OS 10.13.5 and various Linux distros. Run `MasterStateQueryLoad_BENCHMARK_Test.v0State` benchmark and `MasterStateQuery_BENCHMARK_Test.GetState`, see below. **Setup** Processor: Intel i7-4980HQ 2.8 GHz with 6 MB on-chip L3 cache and 128 MB L4 cache (Crystalwell) Total Number of Cores: 4 Total Number of Cores: 8 L2 Cache (per Core): 256 KB Compiler: Apple LLVM version 9.1.0 (clang-902.0.39.2) Optimization: -O2 **MasterStateQuery_BENCHMARK_Test.GetState, v0 '/state' response time** setup | no batching | batching ---------------------------------------------------------|-------------|---------- 1000 agents, 10000 running, and 10000 completed tasks | 146.496ms | 158.319ms 10000 agents, 100000 running, and 100000 completed tasks | 1.795s | 1.899s 20000 agents, 200000 running, and 200000 completed tasks | 3.742s | 4.427s 40000 agents, 400000 running, and 400000 completed tasks | 10.946s | 11.096s **MasterStateQueryLoad_BENCHMARK_Test.v0State, setup 1** Test setup 1: 100 agents with a total of 10000 running tasks and 10000 completed tasks; 50 '/state' and '/flags' requests will be sent in parallel with 200ms interval, i.e., total **50 measurements** per endpoint. /flags | no batching | batching /state | no batching | batching ------------------------------- * -------------------------------- min | 1.598ms | 1.475ms min | 100.627ms | 105.383ms p25 | 2.370ms | 2.452ms p25 | 102.206ms | 107.184ms p50 | 2.520ms | 2.562ms p50 | 103.213ms | 108.468ms p75 | 2.623ms | 2.665ms p75 | 104.100ms | 109.808ms p90 | 2.803ms | 2.731ms p90 | 106.079ms | 111.043ms max | 84.957ms | 2.934ms max | 153.438ms | 154.636ms **MasterStateQueryLoad_BENCHMARK_Test.v0State, setup 2** Test setup 2: 1000 agents with a total of 100000 running tasks and 100000 completed tasks; 10 '/state' and '/flags' requests will be sent in parallel with 200ms interval, i.e., total **10 measurements** per endpoint. /flags | no batching | batching /state | no batching | batching -------------------------------- * ------------------------------- min | 2.309ms | 1.579ms min | 1.512s | 2.820s p25 | 1.547s | 373.609ms p25 | 3.262s | 3.588s p50 | 3.189s | 831.261ms p50 | 5.052s | 4.253s p75 | 5.346s | 2.215s p75 | 6.846s | 4.510s p90 | 5.854s | 2.351s p90 | 7.883s | 4.705s max | 7.237s | 2.444s max | 8.517s | 4.934s Thanks, Alexander Rukletsov