Re: New scheduler API proposal: unsuppress and clear_filter

Benjamin Bannier Mon, 10 Dec 2018 08:58:34 -0800

Hi Ben et al.,

I'd expect frameworks to *always* know how to accept or decline offers in 
general. More involved frameworks might know how to suppress offers. I don't 
expect that any framework models filters and their associated durations in 
detail (that's why I called them a Mesos implementation detail) since there is 
not much benefit to a framework's primary goal of running tasks as quickly as 
possible.


> I couldn't quite tell how you were imagining this would work, but let me 
> spell out the two models that I've been considering, and you can tell me if 
> one of these matches what you had in mind or if you had a different model in 
> mind:

> (1) "Effective limit" or "give me this much more" ...

This sounds more like an operator-type than a framework-type API to me. I'd 
assume that frameworks would not worry about their total limit the way an 
operator would, but instead care about getting resources to run a certain task 
at a point in time. I could also imagine this being easy to use incorrectly as 
frameworks would likely need to understand their total limit when issuing the 
call which could require state or coordination among internal framework 
components (think: multi-purpose frameworks like Marathon or Aurora).

> (2) "Matchers" or "give me things that look like this": when a scheduler 
> expresses its "request" for a role, it would act as a "matcher" (opposite of 
> filter). When mesos is allocating resources, it only proceeds if 
> (requests.matches(resources) && !filters.filtered(resources)). The open ended 
> aspect here is what a matcher would consist of. Consider a case where a 
> matcher is a resource quantity and multiple are allowed; if any matcher 
> matches, the result is a match. This would be equivalent to letting 
> frameworks specify their own --min_allocatable_resources for a role (which is 
> something that has been considered). The "matchers" could be more 
> sophisticated: full resource objects just like filters (but global), full 
> resource objects but with quantities for non-scalar resources like ports, etc.

I was thinking in this direction, but what you described is more involved than 
what I had in mind as a possible first attempt. I'd expect that frameworks 
currently use `REVIVE` as a proxy for `REQUEST_RESOURCES`, not as a way to 
manage their filter state tracked in the allocator. Assuming we have some way 
to express resource quantities (i.e., MESOS-9314), we should be able to improve 
on `REVIVE` by providing a `REQUEST_RESOURCES` which clears all filters for 
resource containing the requested resources (or all filters if no explicit 
resource request). Even if that let to more offers than needed it would likely 
still perform better than `REVIVE` (or `CLEAR_FILTERS` which has similar 
semantics). If we keep the scope of these calls narrow and clear we have 
freedom to be smarter in the future internally.

This should not only be pretty straight-forward to implement in Mesos, but I'd 
imagine also map pretty well onto framework use cases (i.e., I assume 
frameworks are interested in controlling the resources they are offered, not in 
managing filters we maintain for them).

> With regard to incentives, the incentive today for adhering to suppress is 
> that your framework will be doing less processing of offers when it has no 
> work to do and that other instances of your own framework as well as other 
> frameworks would get resources faster. The second aspect is indeed indirect. 
> The incentive structure with "request" / "demand" does indeed seem to be more 
> direct (while still having the indirect benefit on other frameworks / roles): 
> "I'll tell you what to show me so that I get it faster".

Additionally, by potentially explicitly introducing filters as a framework API 
concept, we ask the majority of framework authors to reason about an aspect 
they didn't have to worry about up until then (previously: "if work arrives, 
revive, and decline until an offer can be accepted, then suppress"). If we 
provided them something which fits their *current mental model* while also 
gives them more control, we have a higher chance of it being globally useful 
and adopted than if we'd add an expert-level knob.

> However, as far as performance is concerned, we still need suppress adoption 
> and not just request adoption. Suppress is actually the bigger performance 
> win at the current time, unless we think that frameworks with no work would 
> "effectively suppress" via requests (e.g. "no work? set a 0 request so 
> nothing matches"). Note though, that "effectively suppressing" via requests 
> has the same incentive structure as suppress itself, right?

I was also wondering about how what I suggested would fit here as we have two 
concepts controlling if and which offers a framework gets (a single global flag 
for suppress, and a zoo of many fine-grained filters). Currently we only expose 
`SUPPRESS`, `DECLINE`, and `REVIVE`. It seems that explicitly adding framework 
control over filters to that might restrict what we can do internally in the 
future. Right now the API gives us some freedom how we interpret declines, we 
could e.g., merge filters which expire at the same time, or even interpret 
filters on all cluster resources interchangebly with a suppressed state (the 
API would actually allow us to put a framework into suppressed state -- maybe 
for some time -- even before it has seen all resources). If we exposed filters 
we loose some of that implementation freedom, and we should make sure it is 
worth it.

As for incentives, if we finally added `REQUEST_RESOURCES` we’d allow 
frameworks to make their interaction with Mesos more declarative yet 
conceptually not much harder. Even if we (Mesos) wouldn’t be able to implement 
optimal handling right away, it should could already be useful with an MVP 
implementation on the Mesos side. Also, it would open up potential for future 
optimizations with frameworks already "speaking the right protocol". 



Cheers,

Benjamin

Re: New scheduler API proposal: unsuppress and clear_filter

Reply via email to