Hi folks, One of the long standing issues with running many frameworks on Mesos is the presence of what is called "offer starvation". This is when some role/framework that has unsatisfied demand is not receiving offers, while mesos is continually sends offers to other roles/frameworks that don't want them. This was originally captured via:
https://issues.apache.org/jira/browse/MESOS-3202 It's currently not possible to program a well-behaved scheduler to avoid this issue, since the only mechanisms schedulers have today is to SUPPRESS if they have no work to do and otherwise filter offers that aren't needed for a timeout. However, a scheduler that has short lived workloads must REVIVE frequently (which clears all of its filters). With a sufficient number of these frameworks Mesos may not be able to allocate all the available resources. See the Background section of the document for more details. This document goes over the background of the issue and covers various solutions for addressing it. Some of them are longer term and would merit their own design doc: https://docs.google.com/document/d/1uvTmBo_21Ul9U_mijgWyh7hE0E_yZXrFr43JIB9OCl8 The current thinking is that it would be simplest in the short term to provide an alternative sorter to DRF that can be chosen when starting the master (e.g. random). In the medium term, we may add demand-awareness, and long term migrate to shared state scheduling. Please share any feedback or questions, thanks! Ben