Hi: tl;dr: We are proposing to add two new V1 scheduler APIs: unsuppress and clear_filter in order to decouple the dual-semantics of the current revive call.
As pointed out in the Mesos framework scalability guide <http://mesos.apache.org/documentation/latest/app-framework-development-guide/#multi-scheduler-scalability>, utilizing the suppress <http://mesos.apache.org/documentation/latest/scheduler-http-api/#suppress> call is the key to get your cluster to a large number of frameworks <https://schd.ws/hosted_files/mesoscon18/84/Scaling%20Mesos%20to%20Thousands%20of%20Frameworks.pdf>. In short, when a framework is idling with no intention to launch any tasks, it should suppress to inform the Mesos to stop sending any more offers. And the framework should revive <http://mesos.apache.org/documentation/latest/scheduler-http-api/#revive> when new work arrives. This way, the allocator will skip the framework when performing resource allocations. As a result, thorny issues such as offer starvation and resource fragmentation would be greatly mitigated. That being said. The suppress/revive calls currently are a little bit unwieldy due to MESOS-9028 <https://issues.apache.org/jira/browse/MESOS-9028>: The revive call has two semantics. It unsuppresses the framework AND clears all the existing filters. The later makes the revive call non-idempotent. And sometimes users may want to keep the existing filters when reiving which is not possible atm. To decouple the semantics, as suggested in the ticket, we propose to add two new V1 scheduler calls: (1) `UNSUPPRESS` call requests the Mesos to resume sending offers; (2) `CLEAR_FILTER` call will explicitly clear all the existing filters. To make life easier, both calls will return 200 OK (as opposed to 202 returned by most existing scheduler calls, including `SUPPRESS` and `REVIVE`). We will keep the revive call and its semantics (i.e. unsupppress AND clear filters) for backward compatibility. Note, the changes are proposed for V1 API only. Thus, once the changes are landed, framework developers are encouraged to move to V1 API to take advantage of the new calls (among many other benefits). Any feedback/comments are welcome. -Meng