> On Oct. 19, 2017, 6:38 p.m., Benjamin Mahler wrote: > > Thanks Yan! I will dig in soon. > > > > Just some quick questions: > > > > (1) I thought during the meeting you said it was taking a minute, but > > looking at all the benchmark timings they're all under a second? Is it only > > the benchmark setup that's expensive here? > > (2) Is this with the lock free event & run queues? If not, how much do they > > help? > > (3) As an aside, it has come up before, but it would be useful to be able > > to force the messages to go through the remote stack rather than the local > > stack. No need to think about this yet, but just something to keep in mind > > as not being accurate in this benchmark. > > Jiang Yan Xu wrote: > 1) Yeah looks like it. I used to include the setup time so it was large. > 2) Yeah I have used `--enable-optimize --enable-lock-free-run-queue > --enable-lock-free-event-queue > --enable-last-in-first-out-fixed-size-semaphore`. I could compare with the > perf without them. > 3) Right right I think we should keep that in mind and we should have > tests that cover the remote stack. For the case here I thought it would be a > simple and good-enough start since the local stack alright coveres the proto > (de)serliazation and the rest of the libprocess optimization that we recently > have improved.
Haha... actually the sub-second numbers in revision 1 were totally meaningless. I did `process::await(reregistered)` instead of `process::await(reregistered).await();` when I intended to wait for the results... I did some optimization in rev 2 e.g., parallelize the message preparation, allocate from the stack instead of heap but I have to reduce the number of tasks to prevent it from running too long. PTAL. - Jiang Yan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/63174/#review188799 ----------------------------------------------------------- On Oct. 24, 2017, 11:05 a.m., Jiang Yan Xu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/63174/ > ----------------------------------------------------------- > > (Updated Oct. 24, 2017, 11:05 a.m.) > > > Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin. > > > Bugs: MESOS-8098 > https://issues.apache.org/jira/browse/MESOS-8098 > > > Repository: mesos > > > Description > ------- > > The current benchmark is very simple: without framework involvement and > without agent retries but it's possible to add a number of others so I am > creating a new file for them. > > > Diffs > ----- > > src/Makefile.am b60a54a031260de6f1fb43584ae5083df2dc7e31 > src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 > src/tests/master_benchmarks.cpp PRE-CREATION > > > Diff: https://reviews.apache.org/r/63174/diff/2/ > > > Testing > ------- > > Benchmark based off > https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a > (close to current HEAD). > > ``` > [ RUN ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 > Starting reregistration for all agents > Reregistered 2000 agents with a total of 100000 running tasks and 100000 > completed tasks in 11.188008209secs > [ OK ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 > (22404 ms) > [ RUN ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 > Starting reregistration for all agents > Reregistered 2000 agents with a total of 200000 running tasks and 0 completed > tasks in 20.868372615secs > [ OK ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 > (37981 ms) > [ RUN ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 > Starting reregistration for all agents > Reregistered 20000 agents with a total of 100000 running tasks and 0 > completed tasks in 15.354579251secs > [ OK ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 > (33766 ms) > [----------] 3 tests from > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (94151 ms total) > > > [ RUN ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 > Starting reregistration for all agents > Reregistered 2000 agents with a total of 100000 running tasks and 100000 > completed tasks in 11.045441129secs > [ OK ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 > (19959 ms) > [ RUN ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 > Starting reregistration for all agents > Reregistered 2000 agents with a total of 200000 running tasks and 0 completed > tasks in 21.324309077secs > [ OK ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 > (38490 ms) > [ RUN ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 > Starting reregistration for all agents > Reregistered 20000 agents with a total of 100000 running tasks and 0 > completed tasks in 14.68607521secs > [ OK ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 > (32073 ms) > [----------] 3 tests from > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (90523 ms total) > > ``` > > Benchmark based off > https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d > (before https://issues.apache.org/jira/browse/MESOS-7713 was merged) > > ``` > [ RUN ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 > Starting reregistration for all agents > Reregistered 2000 agents with a total of 100000 running tasks and 100000 > completed tasks in 23.217901878secs > [ OK ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 > (38327 ms) > [ RUN ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 > Starting reregistration for all agents > Reregistered 2000 agents with a total of 200000 running tasks and 0 completed > tasks in 46.158610597secs > [ OK ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 > (75280 ms) > [ RUN ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 > Starting reregistration for all agents > Reregistered 20000 agents with a total of 100000 running tasks and 0 > completed tasks in 38.56781112secs > [ OK ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 > (68006 ms) > [----------] 3 tests from > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (181613 ms total) > > [ RUN ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 > Starting reregistration for all agents > Reregistered 2000 agents with a total of 100000 running tasks and 100000 > completed tasks in 25.752844224secs > [ OK ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0 > (43509 ms) > [ RUN ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 > Starting reregistration for all agents > Reregistered 2000 agents with a total of 200000 running tasks and 0 completed > tasks in 45.190859035secs > [ OK ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1 > (73966 ms) > [ RUN ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 > Starting reregistration for all agents > Reregistered 20000 agents with a total of 100000 running tasks and 0 > completed tasks in 36.322992753secs > [ OK ] > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2 > (66946 ms) > [----------] 3 tests from > AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (184421 ms total) > ``` > > The recently patches cut down the time by over 50%. These were built with > `--enable-optimize --enable-lock-free-run-queue > --enable-lock-free-event-queue > --enable-last-in-first-out-fixed-size-semaphore`. > > > Thanks, > > Jiang Yan Xu > >