Hi folks, Here are some notes from the performance meeting today.
(1) First I did a demo of flamescope, you can find it here: https://github.com/Netflix/flamescope It's a very useful tool, hopefully we can make it easier for users to generate the data that we can drop into flamescope when reporting any performance issues. One of the open questions is how `perf --call-graph dwarf` compares to `perf -g` but with mesos compiled with frame pointers. I haven't had time to check this yet. When playing with the tool, it was easy to find some hot spots in the given cluster I was looking at (which was not necessarily representative). For the agent, jie filed: https://issues.apache.org/jira/browse/MESOS-8901 And for the master, I noticed that metrics, state json generation (no surprise), and a particular spot in the allocator were very expensive. Metrics we'd like to address via migration to push gauges (Zhitao has offered to help with this effort): https://issues.apache.org/jira/browse/MESOS-8914 The state generation we'd like to address via streaming state into a separate actor (and providing filtering as well), this will get further investigated / prioritized very soon: https://issues.apache.org/jira/browse/MESOS-8345 (2) Kapil discussed benchmarks for the long standing "offer starvation" issue: https://issues.apache.org/jira/browse/MESOS-3202 I'll send out an email or document soon with some background on this issue as well as our options to address it. Let me know if you have any questions or feedback! Ben