[ 
https://issues.apache.org/jira/browse/MESOS-9509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16744576#comment-16744576
 ] 

Greg Mann commented on MESOS-9509:
----------------------------------

I have a repository here containing some tooling for benchmarking Mesos checks 
by launching a variable number of tasks in a single pod using 
{{mesos-execute}}: [https://github.com/greggomann/mesos-healthcheck-benchmark]

The main results I've produced so far show how the overall check rate and check 
responsiveness vary with the number of tasks in the pod:
 !check-rate.png|width=544,height=408!
 !check-responsiveness.png|width=544,height=408! 

In the above tests, the check interval was set to zero and the timeout was set 
to 5 minutes so that all checks would be launched again immediately once they 
completed.

I have perf traces from these tests as well, and I'll update this ticket with 
flame graphs from those when I have them. I'd also like to analyze the logs to 
determine how long the agent is spending in each stage of check container 
launch.

For now I'm moving this ticket back to Accepted; myself or someone else can 
pick it back up when they have time, as I believe there's much more work to do 
here.

> Benchmark command health checks in default executor
> ---------------------------------------------------
>
>                 Key: MESOS-9509
>                 URL: https://issues.apache.org/jira/browse/MESOS-9509
>             Project: Mesos
>          Issue Type: Task
>          Components: executor
>            Reporter: Vinod Kone
>            Assignee: Greg Mann
>            Priority: Major
>              Labels: default-executor, foundations, mesosphere, perfomance
>         Attachments: check-rate.png, check-responsiveness.png
>
>
> TCP/HTTP health checks were extensively scale tested as part of 
> https://mesosphere.com/blog/introducing-mesos-native-health-checks-apache-mesos-part-2/.
>  
> We should do the same for command checks by default executor because it uses 
> a very different mechanism (agent fork/execs the check command as a nested 
> container) and will have very different scalability characteristics.
> We should also use these benchmarks as an opportunity to produce perf traces 
> of the Mesos agent (both with and without process inheritance) so that a 
> thorough analysis of the performance can be done as part of MESOS-9513.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to