I've added JIRAs to: 1) Add master flag `--filter-gpu-resources={true|false}` https://issues.apache.org/jira/browse/MESOS-7576
2) Deprecate GPU_RESOURCES capability and master flag `--filter-gpu-resources={true|false}` https://issues.apache.org/jira/browse/MESOS-7579 3) Remove GPU_RESOURCES capability and master flag `--filter-gpu-resources={true|false}` https://issues.apache.org/jira/browse/MESOS-7577 Kevin On Fri, May 26, 2017 at 1:49 PM Benjamin Mahler <bmah...@apache.org> wrote: > I filed https://issues.apache.org/jira/browse/MESOS-7574 for reservations > to multiple roles. We'll find one that captures the deprecation of the > GPU_RESOURCES capability as well, with reservations to multiple roles as a > blocker. > > On Fri, May 26, 2017 at 8:54 AM, Zhitao Li <zhitaoli...@gmail.com> wrote: > > > Hi Benjamin, > > > > Thanks for getting back. Do you have an issue already filed for > > the "reservations to multiple roles" story, or is it folded under another > > JIRA story? > > > > > > > > On Fri, May 26, 2017 at 12:44 AM, Benjamin Mahler <bmah...@apache.org> > > wrote: > > > > > Thanks for the feedback! > > > > > > There have been some discussions for allowing reservations to multiple > > > roles (or more generally, role expressions), which is essentially what > > > you've suggested Zhitao. (However, note that what is provided by the > GPU > > > capability filtering is not quite this, it's actually analogous to a > > > reservation for multiple schedulers, not roles). Reservations to > multiple > > > roles seems to be the right replacement for those who rely on the GPU > > > filtering behavior. > > > > > > Since we don't have reservations to multiple roles at this point, we > > > shouldn't deprecate the GPU_RESOURCES capability until this is in > place. > > > > > > With hierarchical roles, it's possible (although potentially clumsy) to > > > achieve roughly what is provided by the GPU filtering using sub-roles. > > > Since reservations made to a "gpu" role would be available to all of > the > > > descendant roles within tree, e.g. > > > "gpu/analytics", "gpu/forecasting/training", etc. This is equivalent > to a > > > restricted version of reservations to multiple roles, where the roles > are > > > restricted to the descendant roles. This can get clumsy because if > > > "eng/backend/image-processing" wants to get in on the reserved gpus, > the > > > user would have to place a related role underneath the "gpu" role, e.g. > > > "gpu/eng/backend/image-processing". > > > > > > > The exact reason you mentioned about the "clumsy" part would effectively > > prevent me of implementing this in our org even if it's already > available. > > > > > > > > > > For the addition of the filter, note that this flag would be a > temporary > > > measure that would be removed when the deprecation cycle of the > > capability > > > is complete. It would be good to independently consider the generalized > > > filtering idea you brought up. > > > > > > On Mon, May 22, 2017 at 9:15 AM, Zhitao Li <zhitaoli...@gmail.com> > > wrote: > > > > > > > Hi Kevin, > > > > > > > > Thanks for engaging with the community on this. My 2 cents: > > > > > > > > 1. I feel that this capabilities has a particular useful semantic > which > > > is > > > > lacking in the current reservation system: reserving some scarce > > resource > > > > for a* dynamic list of multiple roles:* > > > > > > > > Right now, any reservation (static or dynamic) can only express the > > > > semantic of "reserving this resource for the given role R". However, > > in a > > > > complex cluster, it is possible that we have [R1, R2, ..., RN] which > > > wants > > > > to share the scarce resource among them but there is another set of > > roles > > > > which should never see the given resource. > > > > > > > > The new hierarchical role (and/or multi-role?) might be able to > > provide a > > > > better solution, but until that's widely available and adopted, the > > > > capabilities based hack is the only thing I know that can solve the > > > > problem. > > > > > > > > In fact, I think if we are going to wo with `--filter-gpu-resources` > > > path, > > > > I think we should make the filter more powerful (i.e, able to handle > > all > > > > known framework <-> resource/host constraints and more types of > scarce > > > > resources) instead of the piecewise patches on a specific use case. > > > > > > > > Happy to chat more on this topic. > > > > > > > > On Sat, May 20, 2017 at 6:45 PM, Kevin Klues <klue...@gmail.com> > > wrote: > > > > > > > > > Hello GPU users, > > > > > > > > > > We are currently considering deprecating the requirement that > > > frameworks > > > > > register with the GPU _RESOURCES capability in order to receive > > offers > > > > that > > > > > contain GPUs. Going forward, we will recommend that users rely on > > > Mesos's > > > > > builtin `reservation` mechanism to achieve similar results. > > > > > > > > > > Before deprecating it, we wanted to get a sense from the community > if > > > > > anyone is currently relying on this capability and would like to > see > > it > > > > > persist. If not, we will begin deprecating it in the next Mesos > > release > > > > and > > > > > completely remove it in Mesos 2.0. > > > > > > > > > > As background, the original motivation for this capability was to > > keep > > > > > “legacy” frameworks from inadvertently scheduling jobs that don’t > > > require > > > > > GPUs on GPU capable machines and thus starving out other frameworks > > > that > > > > > legitimately want to place GPU jobs on those machines. The > assumption > > > > here > > > > > was that most machines in a cluster won't have GPUs installed on > > them, > > > so > > > > > some mechanism was necessary to keep legacy frameworks from > > scheduling > > > > jobs > > > > > on those machines. In essence, it provided an implicit reservation > of > > > GPU > > > > > machines for "GPU aware" frameworks, bypassing the traditional > > > > > `reservation` mechanism already built into Mesos. > > > > > > > > > > In such a setup, legacy frameworks would be free to schedule jobs > on > > > > > non-GPU machines, and "GPU aware" frameworks would be free to > > schedule > > > > GPU > > > > > jobs GPU machines and other types of jobs on other machines (or mix > > and > > > > > match them however they please). > > > > > > > > > > However, the problem comes when *all* machines in a cluster contain > > > GPUs > > > > > (or even if most of the machines in a cluster container them). When > > > this > > > > is > > > > > the case, we have the opposite problem we were trying to solve by > > > > > introducing the GPU_RESOURCES capability in the first place. We end > > up > > > > > starving out jobs from legacy frameworks that *don’t* require GPU > > > > resources > > > > > because there are not enough machines available that don’t have > GPUs > > on > > > > > them to service those jobs. We've actually seen this problem > manifest > > > in > > > > > the wild at least once. > > > > > > > > > > An alternative to completely deprecating the GPU_RESOURCES flag > would > > > be > > > > to > > > > > add a new flag to the mesos master called `--filter-gpu-resources`. > > > When > > > > > set to `true`, this flag will cause the mesos master to continue to > > > > > function as it does today. That is, it would filter offers > containing > > > GPU > > > > > resources and only send them to frameworks that opt into the > > > > GPU_RESOURCES > > > > > framework capability. When set to `false`, this flag would cause > the > > > > master > > > > > to *not* filter offers containing GPU resources, and > indiscriminately > > > > send > > > > > them to all frameworks whether they set the GPU_RESOURCES > capability > > or > > > > > not. > > > > > > > > > > , this flag would allow them to keep relying on it without > > disruption. > > > > > > > > > > We'd prefer to deprecate the capability completely, but would > > consider > > > > > adding this flag if people are currently relying on the > GPU_RESOURCES > > > > > capability and would like to see it persist > > > > > > > > > > We welcome any feedback you have. > > > > > > > > > > Kevin + Ben > > > > > > > > > > > > > > > > > > > > > -- > > > > Cheers, > > > > > > > > Zhitao Li > > > > > > > > > > > > > > > -- > > Cheers, > > > > Zhitao Li > > >