On 05/21/2017 03:45 AM, Kevin Klues wrote: > Hello GPU users, > > We are currently considering deprecating the requirement that frameworks > register with the GPU _RESOURCES capability in order to receive offers that > contain GPUs. Going forward, we will recommend that users rely on Mesos's > builtin `reservation` mechanism to achieve similar results. > > Before deprecating it, we wanted to get a sense from the community if > anyone is currently relying on this capability and would like to see it > persist. If not, we will begin deprecating it in the next Mesos release and > completely remove it in Mesos 2.0. Well, I am using it for GoDocker framework where jos can specify to sue (or not) some GPUs. > > As background, the original motivation for this capability was to keep > “legacy” frameworks from inadvertently scheduling jobs that don’t require > GPUs on GPU capable machines and thus starving out other frameworks that > legitimately want to place GPU jobs on those machines. The assumption here > was that most machines in a cluster won't have GPUs installed on them, so > some mechanism was necessary to keep legacy frameworks from scheduling jobs > on those machines. In essence, it provided an implicit reservation of GPU > machines for "GPU aware" frameworks, bypassing the traditional > `reservation` mechanism already built into Mesos. > > In such a setup, legacy frameworks would be free to schedule jobs on > non-GPU machines, and "GPU aware" frameworks would be free to schedule GPU > jobs GPU machines and other types of jobs on other machines (or mix and > match them however they please). > > However, the problem comes when *all* machines in a cluster contain GPUs > (or even if most of the machines in a cluster container them). When this is > the case, we have the opposite problem we were trying to solve by > introducing the GPU_RESOURCES capability in the first place. We end up > starving out jobs from legacy frameworks that *don’t* require GPU resources > because there are not enough machines available that don’t have GPUs on > them to service those jobs. We've actually seen this problem manifest in > the wild at least once. > > An alternative to completely deprecating the GPU_RESOURCES flag would be to > add a new flag to the mesos master called `--filter-gpu-resources`. When > set to `true`, this flag will cause the mesos master to continue to > function as it does today. That is, it would filter offers containing GPU > resources and only send them to frameworks that opt into the GPU_RESOURCES > framework capability. When set to `false`, this flag would cause the master > to *not* filter offers containing GPU resources, and indiscriminately send > them to all frameworks whether they set the GPU_RESOURCES capability or not. > > , this flag would allow them to keep relying on it without disruption. > > We'd prefer to deprecate the capability completely, but would consider > adding this flag if people are currently relying on the GPU_RESOURCES > capability and would like to see it persist > > We welcome any feedback you have. > > Kevin + Ben >
-- Olivier Sallou IRISA / University of Rennes 1 Campus de Beaulieu, 35000 RENNES - FRANCE Tel: 02.99.84.71.95 gpg key id: 4096R/326D8438 (keyring.debian.org) Key fingerprint = 5FB4 6F83 D3B9 5204 6335 D26D 78DC 68DB 326D 8438