Hi folks: During the design review meetings, the main discussion point is around whether we should allow setting quota guarantees for resources with specific meta-data. And the main use case is for disks with profiles.
The current proposal in the doc is to only allow setting guarantees on top-level resources such as "cpus" and "disk". And limits can be set on any resources even with meta-data. it has a caveat though that if limits are set "underneath" the guarantee (e.g. a guarantee of disk co-exists with a limit of disk with a specific profile), guarantees might not be satisfied depending on the cluster usage. This proposal does not support the use case of setting quota for disks with profiles. This limitation and the caveat mentioned above are both due to the quota propagation issue when there is a resource meta-data hierarchy as explained in these related sections in the design doc <https://docs.google.com/document/d/13vG5uH4YVwM79ErBPYAZfnqYFOBbUy2Lym0_9iAQ5Uk/edit#heading=h.i4lsj45vylfu> . It looks like there are a few options here with regard to setting quotas on disks with profiles: 1. Stick to the current proposal, but treat disks with a profile as a top-level resource (think about this as something completely unrelated to "disk" e.g. "cpus"), so that guarantees can be set on it. 2. Add support for setting guarantees on any meta-data resource, but with restrictions such that once a guarantee or a limit is set on a resource with meta data, no more quotas can be configured for resources on the same path in the meta-data hierarchy. For example, disk, disk with a fast profile, and disk comes from vendor A are all considered resources on the same path. Once one type of resource has a quota, no other resource types on the same path can have quotas. 3. Add support for setting guarantees and limits on any meta-data resources, but running the risk of guarantees might not get satisfied. 4. Add support for setting guarantees and limits on any meta-data resources, and use the linear programming model to figure out how to satisfy all the quotas. 5. Stick to the current proposal and does not support setting quotas on disks with profiles. Option 1 raises the question that should we treat resources like EBS as something completely different from vanilla local disk? And if not (as the option suggests), we need to update other parts of the system accordingly. For example, endpoints, metrics, the allocator and etc. should stop treating disk profile as "disk". Option 2 seems to be too restrictive. It can be hard to reason and unwieldy for the user. Option 3 would certainly be easy to use. But after setting up the guarantees, users would expect the guarantees can be satisfied which Mesos may not be able to deliver. And when that happens there is no easy explanation to why the guarantees are not satisfied. Option 4 allows and enforces all the guarantees optimally. However, it is not clear what is the performance implication of going through all the optimization solvers. Also, since guarantees are not part of the long term plan as we introduce priority tiers, we should ask whether it is worth the complexity and effort. Option 5 essentially kicks the can down the road, as the use case for setting quotas on disk with profile is not immediate. For MVP, we could stick to the design proposal and prepare to extend that when needs arise (likely in the medium term). Thoughts? Thanks, Meng On Thu, Jan 24, 2019 at 9:58 AM Meng Zhu <[email protected]> wrote: > After the API WG sync, we want to schedule a follow up meeting to discuss > Quota 2.0 further. If you are interested, please join us at 12:30pm PST > today (Jan 24th) with the zoom link below. Sorry for the short notice. > > -Meng > > Join Zoom Meeting https://zoom.us/j/574632536 > <https://www.google.com/url?q=https%3A%2F%2Fzoom.us%2Fj%2F574632536&sa=D&ust=1548784417513000&usg=AFQjCNEiLMZoqWW2x5X0oH-AhrN2GlLAiQ> > One tap mobile +16699006833,,574632536# US (San Jose) > +16465588656,,574632536# US (New York) Dial by your location +1 669 900 > 6833 US (San Jose) +1 646 558 8656 US (New York) Meeting ID: 574 632 536 > Find your local number: https://zoom.us/u/acZYnvuO63 > <https://www.google.com/url?q=https%3A%2F%2Fzoom.us%2Fu%2FacZYnvuO63&sa=D&ust=1548784417513000&usg=AFQjCNGCJXDosuVT9iEhjg_KeyoBZT4XxQ> > > On Sun, Jan 20, 2019 at 8:07 PM Meng Zhu <[email protected]> wrote: > >> Hi folks: >> >> I am excited to propose Quota 2.0 for better resource management on >> Mesos, with explicit limits (decoupled from guarantee), generic quota >> (which can be set on resources with metadata and on more generic resources >> such as the number of containers) and bright shiny new APIs. >> >> You can find the design doc here >> <https://docs.google.com/document/d/13vG5uH4YVwM79ErBPYAZfnqYFOBbUy2Lym0_9iAQ5Uk/edit?usp=sharing>. >> Please feel free to leave comments and suggestions. >> >> I have also put an agenda item for the upcoming API working group meeting >> on Tuesday (Jan 22nd, 11am PST), please join if you are interested. >> >> Thanks, >> Meng >> >
