Thanks to those who joined: Yan Xu, Chun-Hung Hsiao, Meng Zhu, Carl Dellar Notes:
(1) I forgot to mention during the meeting that more progress has happened on the parallel reads of master state for the other read-only endpoints. Alex or Benno can reply to this thread to provide an update. [1] (2) Work is ongoing to improve allocation cycle performance [2]: (a) The patches for making the Resources wrapper copy on write are ready to land [3]. These improve the performance of common filtering operations. Meng presented some allocation cycle data based on this: https://docs.google.com/spreadsheets/d/1GmBdialteknPDf8IdumzPbF4bGmu7 5mVHIpiXLf3xHc The data shows that copy on write Resources improves allocation cycle time significantly, but as there are more frameworks, the Sorter starts to dominate the time spent in the allocation cycle and the relative benefit decreases. (b) To improve allocation cycle time further by addressing the sorter performance issues, I sent out / will send out a few patches [4]. The two that provide the most benefit are: introducing an efficient ScalarResourceQuantities type to make sort itself faster, and avoiding dirtying the sorter upon allocation so that the allocation cycle doesn't have to keep re-sorting. The latter requires an additional change to update the usage of framework sorters so that the total they use are the entire cluster rather than the role allocation. (3) There's also been significant improvements to the master's offer fan-out path [5]. We don't yet have a benchmark for this, but I'll try to demonstrate the improvement in 1.8. (4) Meng showed a new allocator benchmark test fixture that Kapil worked on that makes it easier to get a "cluster" set up with a particular configuration to make it easier to measure allocator scenarios of interest [6]. (5) We chatted briefly about the master's call ingestion performance, there's a benchmark [7] that uses the reconciliation call to send a big message and Ilya looked into the results some time ago, but we should revisit and gather performance data. (6) I'm nearly done with the 1.7.0 performance blog post, just waiting on some data from Alex / Benno. Agenda Doc: https://docs.google.com/document/d/ 12hWGuzbqyNWc2l1ysbPcXwc0pzHEy4bodagrlNGCuQU Ben [1] https://issues.apache.org/jira/browse/MESOS-9158 [2] https://issues.apache.org/jira/browse/MESOS-9087 [3] https://issues.apache.org/jira/browse/MESOS-6765 [4] https://issues.apache.org/jira/browse/MESOS-9239 [5] https://issues.apache.org/jira/browse/MESOS-9234 [6] https://issues.apache.org/jira/browse/MESOS-9187 [7] https://github.com/apache/mesos/blob/1.7.0/src/tests/scheduler_tests.cpp#L2164-L2230