See my comments inline.

On Mon, Dec 3, 2018 at 5:43 PM Vinod Kone <vinodk...@apache.org> wrote:

> Thanks Meng for the explanation.
>
> I imagine most frameworks do not remember what stuff they filtered much
> less figure out how previously filtered stuff  can satisfy new operations.
> That sounds complicated!
>

Frameworks do not need to remember what filters they currently have. Only
knowing
the resource profiles of the current vs. the previous operation would help
a lot.
But yeah, even this may be too much complexity.

>
> But I like your example. So a suggestion we could make to frameworks could
> be to use CLEAR_FILTERS when they have new work, e.g., scale up/down, new
> app (they might want to use this even if they aren't suppressed!); and to
> use UNSUPPRESS when they are rescheduling old work?
>

Yeah, these are the general guideline.

I want to echo and reemphasize that CLEAR_FILTERS is orthogonal to
suppression.
Framework should consider clearing filters regardless of suppression.

Ideally, when there is new different work, old irelavent filters should be
cleared. This helps
framework to get more offers and makes the allocator run faster (filter
could take up
bulk of the allocation time when they build up). On the flip side, calling
CLEAR_FILTERS too often
might also have performance implications (esp. if the master/allocator
actors are already stressed).

Thoughts?
>
> On Mon, Dec 3, 2018 at 6:51 PM Meng Zhu <m...@mesosphere.com> wrote:
>
> > Hi Vinod:
> >
> > Yeah, `CLEAR_FILTERS` sounds good.
> >
> > UNSUPPRESS should be used whenever currently suppressed framework wants
> to
> > resume getting offers after a previous SUPPRESS call.
> >
> > As for `CLEAR_FILTERS`, the short (but not very useful) suggestion is to
> > call it whenever the framework wants to clear all the existing filters.
> >
> > To elaborate it, frameworks decline and accumulate filters when it is
> > trying to satisfy a particular set of requirements/constraints to perform
> > an operation. Once the operation is done and the next operation comes, if
> > the new operation has the same (or strictly more) resource
> > requirements/constraints compared to the last one, then it is more
> > efficient to KEEP the existing filters instead of getting useless offers
> > and rebuild the filters again.
> >
> > On the other hand, if the requirements/constraints are different (i.e.
> some
> > of the previous requirements could be loosened), then it means the
> existing
> > filter no longer make sense. Then it might be a good idea to clear all
> the
> > existing filters to improve the chance of getting more offers.
> >
> > Note, although we introduce `CLEAR_FILTERS` as part of decoupling the
> > `REVIVE` call, its usage should be independent of suppression/revival.
> The
> > decision to clear the filters only depends on whether the existing
> filters
> > make sense for the current operation constraints/requirements.
> >
> > Examples:
> > If a framework first launches a task, then wants to launch a replacement
> > task (because the first task failed), then it should keep the filters
> built
> > up during the first launch. However, if the framework wants to launch a
> > second task with a completely different resource profile, then clearing
> > filters might help to get more (otherwise filtered) offers and hence
> speed
> > up the deployment.
> >
> > -Meng
> >
> > On Mon, Dec 3, 2018 at 12:40 PM Vinod Kone <vinodk...@apache.org> wrote:
> >
> > > Hi Meng,
> > >
> > > What would be the recommendation for framework authors on when to use
> > > UNSUPPRESS vs CLEAR_FILTER?
> > >
> > > Also, should it CLEAR_FILTERS instead of CLEAR_FILTER?
> > >
> > > On Mon, Dec 3, 2018 at 2:26 PM Meng Zhu <m...@mesosphere.com> wrote:
> > >
> > >> Hi:
> > >>
> > >> tl;dr: We are proposing to add two new V1 scheduler APIs: unsuppress
> and
> > >> clear_filter in order to decouple the dual-semantics of the current
> > revive
> > >> call.
> > >>
> > >> As pointed out in the Mesos framework scalability guide
> > >> <
> >
> http://mesos.apache.org/documentation/latest/app-framework-development-guide/#multi-scheduler-scalability
> > >,
> > >> utilizing the suppress
> > >> <
> >
> http://mesos.apache.org/documentation/latest/scheduler-http-api/#suppress>
> > >> call is the key to get your cluster to a large number of frameworks
> > >> <
> >
> https://schd.ws/hosted_files/mesoscon18/84/Scaling%20Mesos%20to%20Thousands%20of%20Frameworks.pdf
> > >.
> > >> In short, when a framework is idling with no intention to launch any
> > tasks,
> > >> it should suppress to inform the Mesos to stop sending any more
> offers.
> > And
> > >> the framework should revive
> > >> <
> > http://mesos.apache.org/documentation/latest/scheduler-http-api/#revive>
> > >> when new work arrives. This way, the allocator will skip the framework
> > when
> > >> performing resource allocations. As a result, thorny issues such as
> > offer
> > >> starvation and resource fragmentation would be greatly mitigated.
> > >>
> > >> That being said. The suppress/revive calls currently are a little bit
> > >> unwieldy due to MESOS-9028
> > >> <https://issues.apache.org/jira/browse/MESOS-9028>:
> > >>
> > >> The revive call has two semantics. It unsuppresses the framework AND
> > >> clears all the existing filters. The later makes the revive call
> > >> non-idempotent. And sometimes users may want to keep the existing
> > filters
> > >> when reiving which is not possible atm.
> > >>
> > >> To decouple the semantics, as suggested in the ticket, we propose to
> add
> > >> two new V1 scheduler calls:
> > >>
> > >> (1) `UNSUPPRESS` call requests the Mesos to resume sending offers;
> > >> (2) `CLEAR_FILTER` call will explicitly clear all the existing
> filters.
> > >>
> > >> To make life easier, both calls will return 200 OK (as opposed to 202
> > >> returned by most existing scheduler calls, including `SUPPRESS` and
> > >> `REVIVE`).
> > >>
> > >> We will keep the revive call and its semantics (i.e. unsupppress AND
> > >> clear filters) for backward compatibility.
> > >>
> > >> Note, the changes are proposed for V1 API only. Thus, once the changes
> > >> are landed, framework developers are encouraged to move to V1 API to
> > take
> > >> advantage of the new calls (among many other benefits).
> > >>
> > >> Any feedback/comments are welcome.
> > >>
> > >> -Meng
> > >>
> > >
> >
>

Reply via email to