Kevin, thanks for the useful info. It makes perfect sense to add gpu support for the docker containerizer, given the large docker user base and relatively small development effort.
On Wed, Apr 10, 2019 at 4:17 AM Kevin Klues <klue...@gmail.com> wrote: > Adding GPU support to the docker containerizer is not something that is > very hard to do. The choice in the past to *not* build GPU support for the > docker containerizer was a conscious one in order get people moved over to > the UCR instead. All of the other innovations we work on are prioritised > for the UCR, and we didn't see a compelling reason to make an exception for > GPU support. Building a solution around nvidia-docker would have been a > solution requiring minimal changes to mesos, but then there would have been > yet another dependency in the system that we didn't want to introduce. > > However, this was a decision made over 3 years ago, and maybe it's time to > revisit it. > > The next docker release will include an integrated `--gpus` flag, bypassing > the need for nvidia-docker entirely: > > https://github.com/docker/cli/pull/1714 > > With this in place it really would be trivial to add support for GPUs to > the docker containerizer, since there would be no requirement for users to > do any external setup for nvidia-docker. > > What do people think? Has the landscape changed and does it now make sense > to add GPU support for the docker containerizer given the new upcoming > `--gpus` flag? > > Kevin > > On Fri, Apr 5, 2019 at 6:58 PM Benjamin Mahler <bmah...@apache.org> wrote: > > > +Kevin Klues > > > > > > On Fri, Apr 5, 2019 at 1:24 AM Huadong Liu <h...@yelp.com.invalid> > wrote: > > > >> Hi Ben, thanks for pointing me to the docker containerizer ticket. I do > >> see > >> the value of UCR. > >> > >> Since nvidia-docker already takes care of mounting the driver etc., if > we > >> use the "--docker=nvidia-docker" agent option to replace the docker > >> command > >> with the nvidia-docker command, GPU support with the docker > containerizer > >> seems trivial. Did I miss anything? > >> > >> On Thu, Apr 4, 2019 at 8:00 PM Benjamin Mahler <bmah...@apache.org> > >> wrote: > >> > >> > The "UCR" (aka mesos containerizer) and "Docker containerizer" are two > >> > different containerizers that users tend to choose between. UCR is > what > >> > many of our serious users rely on and so we made the investment there > >> > first. GPU support for the docker containerizer was also something > that > >> was > >> > planned, but hasn't been prioritized: > >> > https://issues.apache.org/jira/browse/MESOS-5795 > >> > > >> > These days, many of our users use Docker images with UCR (i.e. > bypassing > >> > the need for the docker daemon). > >> > > >> > Maybe the containerization devs can chime in here I'm in saying > anything > >> > inaccurate or to shed some light on where things are headed. > >> > > >> > On Wed, Apr 3, 2019 at 2:21 PM Huadong Liu <h...@yelp.com.invalid> > >> wrote: > >> > > >> > > Hi, > >> > > > >> > > Nvidia GPU support in Mesos/Marathon mandates the mesos > containerizer > >> > > < > >> > > > >> > > >> > https://github.com/mesosphere/marathon/blob/master/src/main/scala/mesosphere/marathon/state/AppDefinition.scala#L557 > >> > > > > >> > > which "mimics" nvidia-docker. > >> > > <http://mesos.apache.org/documentation/latest/gpu-support/> Can > >> someone > >> > > help me understand why docker containerizer with agent option > >> > > "--docker=nvidia-docker" wasn't the choice? Thank you! > >> > > > >> > > -- > >> > > Huadong > >> > > > >> > > >> > > >