Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

Benjamin Mahler Thu, 19 Oct 2017 18:39:07 -0700

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63174/#review188799
-----------------------------------------------------------




Thanks Yan! I will dig in soon.

Just some quick questions:

(1) I thought during the meeting you said it was taking a minute, but looking 
at all the benchmark timings they're all under a second? Is it only the 
benchmark setup that's expensive here?
(2) Is this with the lock free event & run queues? If not, how much do they 
help?
(3) As an aside, it has come up before, but it would be useful to be able to 
force the messages to go through the remote stack rather than the local stack. 
No need to think about this yet, but just something to keep in mind as not 
being accurate in this benchmark.

- Benjamin Mahler


On Oct. 19, 2017, 11:28 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63174/
> -----------------------------------------------------------
> 
> (Updated Oct. 19, 2017, 11:28 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler, Dmitry Zhuk, and Ilya Pronin.
> 
> 
> Bugs: MESOS-8098
>     https://issues.apache.org/jira/browse/MESOS-8098
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The current benchmark is very simple: without framework involvement and 
> without agent retries but it's possible to add a number of others so I am 
> creating a new file for them.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am 936bc49ddfca03b9278ab11b6d317f3ff635cb00 
>   src/tests/CMakeLists.txt 386e0473c93d0a993248c7818067071d0c761c76 
>   src/tests/master_benchmarks.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63174/diff/1/
> 
> 
> Testing
> -------
> 
> Benchmark based off 
> https://github.com/apache/mesos/commit/41193181d6b75eeecae2729bf98007d9318e351a
>  (close to current HEAD).
> 
> ```
> [ RUN      ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 500000 running tasks and 500000 
> completed tasks in 45.075488ms
> [       OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (48126 ms)
> [ RUN      ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 1000000 running tasks and 0 
> completed tasks in 14.172361ms
> [       OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (45979 ms)
> [ RUN      ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 20000 agents with a total of 1000000 running tasks and 0 
> completed tasks in 413.508328ms
> [       OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
>  (49487 ms)
> [----------] 3 tests from 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (143596 ms total)
> 
> ...
> 
> [ RUN      ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 500000 running tasks and 500000 
> completed tasks in 32.787363ms
> [       OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (48266 ms)
> [ RUN      ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 1000000 running tasks and 0 
> completed tasks in 19.735003ms
> [       OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (46169 ms)
> [ RUN      ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 20000 agents with a total of 1000000 running tasks and 0 
> completed tasks in 321.267267ms
> [       OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
>  (51550 ms)
> [----------] 3 tests from 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (145987 ms total)
> ```
> 
> Benchmark based off 
> https://github.com/apache/mesos/commit/d9c90bf1d9c8b3a7dcc47be0cb773efff57cfb9d
>  (before https://issues.apache.org/jira/browse/MESOS-7713 was merged)
> ```
> [ RUN      ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 500000 running tasks and 500000 
> completed tasks in 85.800335ms
> [       OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (59247 ms)
> [ RUN      ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 1000000 running tasks and 0 
> completed tasks in 35.342066ms
> [       OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (93662 ms)
> [ RUN      ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 20000 agents with a total of 1000000 running tasks and 0 
> completed tasks in 798.738642ms
> [       OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
>  (116078 ms)
> [----------] 3 tests from 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (268987 ms total)
> 
> ...
> 
> [ RUN      ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
> Reregistered 2000 agents with a total of 500000 running tasks and 500000 
> completed tasks in 66.270249ms
> [       OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/0
>  (59925 ms)
> [ RUN      ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
> Reregistered 2000 agents with a total of 1000000 running tasks and 0 
> completed tasks in 50.146349ms
> [       OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/1
>  (88631 ms)
> [ RUN      ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
> Reregistered 20000 agents with a total of 1000000 running tasks and 0 
> completed tasks in 807.621964ms
> [       OK ] 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test.AgentReregistrationDelay/2
>  (109941 ms)
> [----------] 3 tests from 
> AgentFrameworkTaskCount/MasterFailover_BENCHMARK_Test (258497 ms total)
> ```
> 
> The recently patches cut down the time by nearly 50%. These were built with 
> `--enable-optimize`.
> 
> I can also get some flame graphs.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>

Re: Review Request 63174: Added a benchmark for agent reregistration during master failover.

Reply via email to