[ https://issues.apache.org/jira/browse/MESOS-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969499#comment-16969499 ]
Andrei Sekretenko commented on MESOS-6405: ------------------------------------------ Tried to run both benchmarks (existing SchedulerReconcileTasks_BENCHMARK_Test and Anand's r53113) on the current master head. Both show noticeable added overhead for V1 API (~10x lower throughput). It should be noted that both benchmarks run against an empty Mesos master, i.e. what they show is basically an overhead due to HTTP/(de)serialization/authentication/etc...) It turns out that issues exposed by these two benchmarks are totally different, which is not surprising at all: the first sends a single call and receives a multitude of events, whereas the second one sends one call in repetition but receives no API events. The largest issue which shows up in the SchedulerReconcileTasks_BENCHMARK_Test are inefficiencies in the V1 C++ scheduler client library (spawning/terminating an AsyncExecutor process per each event and so on). See [^SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary_flamegraph.svg] The benchmark from r53113 (SchedulerCallIngestion_BENCHMARK_Test) shows surprisingly large overhead of using process::Sequence (master only?) and HttpProxy (both sides?), and also of per-request authentication (on both sides). Compare: [^SchedulerCallIngestion_BENCHMARK_Test.SchedulerLibrary_flamegraph.svg] and [^SchedulerCallIngestion_BENCHMARK_Test.SchedulerDriver_flamegraph.svg] > Benchmark call ingestion path on the Mesos master. > -------------------------------------------------- > > Key: MESOS-6405 > URL: https://issues.apache.org/jira/browse/MESOS-6405 > Project: Mesos > Issue Type: Improvement > Components: master, scheduler api > Reporter: Anand Mazumdar > Assignee: Anand Mazumdar > Priority: Critical > Labels: mesosphere > Attachments: > SchedulerCallIngestion_BENCHMARK_Test.SchedulerDriver_flamegraph.svg, > SchedulerCallIngestion_BENCHMARK_Test.SchedulerDriver_stacks.gz, > SchedulerCallIngestion_BENCHMARK_Test.SchedulerLibrary_flamegraph.svg, > SchedulerCallIngestion_BENCHMARK_Test.SchedulerLibrary_stacks.gz, > SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary_flamegraph.svg, > SchedulerReconcileTasks_BENCHMARK_Test.SchedulerLibrary_stacks.gz > > > [~drexin] reported on the user mailing > [list|http://mail-archives.apache.org/mod_mbox/mesos-user/201610.mbox/%3C6B42E374-9AB7-4444-A315-A6558753E08B%40apple.com%3E] > that there seems to be a significant regression in performance on the call > ingestion path on the Mesos master wrt to the scheduler driver (v0 API). > We should create a benchmark to first get a sense of the numbers and then go > about fixing the performance issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)