Re: building docker images for GPU
Just notice current spark task scheduling doesn't recognize any /device as constraints. What might happen as a result would be multiple tasks stuck on racing to acquire GPU/FPGA (you name it) Not sure if "multiple process"on one GPU works same as how CPU designed. If not, we should consider kinda binding in task scheduler and executorInfo. eg. task 0 executor 1 2 cpu /device/gpu/0 task 1 executor 1 2 cpu /device/gpu/1 Chen On Tue, Feb 12, 2019 at 11:04 AM Marcelo Vanzin wrote: > I think I remember someone mentioning a thread about this on the PR > discussion, and digging a bit I found this: > > http://apache-spark-developers-list.1001551.n3.nabble.com/Toward-an-quot-API-quot-for-spark-images-used-by-the-Kubernetes-back-end-td23622.html > > It started a discussion but I haven't really found any conclusion. > > In my view here the discussion is the same: what is the contract > between the Spark code that launches the driver / executor pods, and > the images? > > Right now the contract is defined by the code, which makes it a little > awkward for people to have their own customized images. They need to > kinda follow what the images in the repo do and hope they get it > right. > > If instead you define the contract and make the code follow it, then > it becomes easier for people to provide whatever image they want. > > Matt also filed SPARK-24655, which has seen no progress nor discussion. > > Someone else filed SPARK-26773, which is similar. > > And another person filed SPARK-26597, which is also in the same vein, > and also suggests something that in the end I agree with: Spark > shouldn't be opinionated about the image and what it has; it should > tell the container to run a Spark command to start the driver or > executor, which should be in the image's path, and shouldn't require > an entry point at all. > > Anyway, just wanted to point out that this discussion isn't as simple > as "GPU vs. not GPU", but it's a more fundamental discussion about > what should the container image look like, so that people can > customize it easily. After all, that's one of the main points of using > container images, right? > > On Mon, Feb 11, 2019 at 11:53 AM Matt Cheah wrote: > > > > I will reiterate some feedback I left on the PR. Firstly, it’s not > immediately clear if we should be opinionated around supporting GPUs in the > Docker image in a first class way. > > > > > > > > Firstly there’s the question of how we arbitrate the kinds of > customizations we support moving forward. For example if we say we support > GPUs now, what’s to say that we should not also support FPGAs? > > > > > > > > Also what kind of testing can we add to CI to ensure what we’ve provided > in this Dockerfile works? > > > > > > > > Instead we can make the Spark images have bare minimum support for basic > Spark applications, and then provide detailed instructions for how to build > custom Docker images (mostly just needing to make sure the custom image has > the right entry point). > > > > > > > > -Matt Cheah > > > > > > > > From: Rong Ou > > Date: Friday, February 8, 2019 at 2:28 PM > > To: "dev@spark.apache.org" > > Subject: building docker images for GPU > > > > > > > > Hi spark dev, > > > > > > > > I created a JIRA issue a while ago ( > https://issues.apache.org/jira/browse/SPARK-26398 [issues.apache.org]) to > add GPU support to Spark docker images, and sent a PR ( > https://github.com/apache/spark/pull/23347 [github.com]) that went > through several iterations. It was suggested that it should be discussed on > the dev mailing list, so here we are. Please chime in if you have any > questions or concerns. > > > > > > > > A little more background. I mainly looked at running XGBoost on Spark > using GPUs. Preliminary results have shown that there is potential for > significant speedup in training time. This seems like a popular use case > for Spark. In any event, it'd be nice for Spark to have better support for > GPUs. Building gpu-enabled docker images seems like a useful first step. > > > > > > > > Thanks, > > > > > > > > Rong > > > > > > > > -- > Marcelo > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >
Re: building docker images for GPU
I think I remember someone mentioning a thread about this on the PR discussion, and digging a bit I found this: http://apache-spark-developers-list.1001551.n3.nabble.com/Toward-an-quot-API-quot-for-spark-images-used-by-the-Kubernetes-back-end-td23622.html It started a discussion but I haven't really found any conclusion. In my view here the discussion is the same: what is the contract between the Spark code that launches the driver / executor pods, and the images? Right now the contract is defined by the code, which makes it a little awkward for people to have their own customized images. They need to kinda follow what the images in the repo do and hope they get it right. If instead you define the contract and make the code follow it, then it becomes easier for people to provide whatever image they want. Matt also filed SPARK-24655, which has seen no progress nor discussion. Someone else filed SPARK-26773, which is similar. And another person filed SPARK-26597, which is also in the same vein, and also suggests something that in the end I agree with: Spark shouldn't be opinionated about the image and what it has; it should tell the container to run a Spark command to start the driver or executor, which should be in the image's path, and shouldn't require an entry point at all. Anyway, just wanted to point out that this discussion isn't as simple as "GPU vs. not GPU", but it's a more fundamental discussion about what should the container image look like, so that people can customize it easily. After all, that's one of the main points of using container images, right? On Mon, Feb 11, 2019 at 11:53 AM Matt Cheah wrote: > > I will reiterate some feedback I left on the PR. Firstly, it’s not > immediately clear if we should be opinionated around supporting GPUs in the > Docker image in a first class way. > > > > Firstly there’s the question of how we arbitrate the kinds of customizations > we support moving forward. For example if we say we support GPUs now, what’s > to say that we should not also support FPGAs? > > > > Also what kind of testing can we add to CI to ensure what we’ve provided in > this Dockerfile works? > > > > Instead we can make the Spark images have bare minimum support for basic > Spark applications, and then provide detailed instructions for how to build > custom Docker images (mostly just needing to make sure the custom image has > the right entry point). > > > > -Matt Cheah > > > > From: Rong Ou > Date: Friday, February 8, 2019 at 2:28 PM > To: "dev@spark.apache.org" > Subject: building docker images for GPU > > > > Hi spark dev, > > > > I created a JIRA issue a while ago > (https://issues.apache.org/jira/browse/SPARK-26398 [issues.apache.org]) to > add GPU support to Spark docker images, and sent a PR > (https://github.com/apache/spark/pull/23347 [github.com]) that went through > several iterations. It was suggested that it should be discussed on the dev > mailing list, so here we are. Please chime in if you have any questions or > concerns. > > > > A little more background. I mainly looked at running XGBoost on Spark using > GPUs. Preliminary results have shown that there is potential for significant > speedup in training time. This seems like a popular use case for Spark. In > any event, it'd be nice for Spark to have better support for GPUs. Building > gpu-enabled docker images seems like a useful first step. > > > > Thanks, > > > > Rong > > -- Marcelo - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: building docker images for GPU
I will reiterate some feedback I left on the PR. Firstly, it’s not immediately clear if we should be opinionated around supporting GPUs in the Docker image in a first class way. Firstly there’s the question of how we arbitrate the kinds of customizations we support moving forward. For example if we say we support GPUs now, what’s to say that we should not also support FPGAs? Also what kind of testing can we add to CI to ensure what we’ve provided in this Dockerfile works? Instead we can make the Spark images have bare minimum support for basic Spark applications, and then provide detailed instructions for how to build custom Docker images (mostly just needing to make sure the custom image has the right entry point). -Matt Cheah From: Rong Ou Date: Friday, February 8, 2019 at 2:28 PM To: "dev@spark.apache.org" Subject: building docker images for GPU Hi spark dev, I created a JIRA issue a while ago (https://issues.apache.org/jira/browse/SPARK-26398 [issues.apache.org]) to add GPU support to Spark docker images, and sent a PR (https://github.com/apache/spark/pull/23347 [github.com]) that went through several iterations. It was suggested that it should be discussed on the dev mailing list, so here we are. Please chime in if you have any questions or concerns. A little more background. I mainly looked at running XGBoost on Spark using GPUs. Preliminary results have shown that there is potential for significant speedup in training time. This seems like a popular use case for Spark. In any event, it'd be nice for Spark to have better support for GPUs. Building gpu-enabled docker images seems like a useful first step. Thanks, Rong smime.p7s Description: S/MIME cryptographic signature
building docker images for GPU
Hi spark dev, I created a JIRA issue a while ago ( https://issues.apache.org/jira/browse/SPARK-26398) to add GPU support to Spark docker images, and sent a PR ( https://github.com/apache/spark/pull/23347) that went through several iterations. It was suggested that it should be discussed on the dev mailing list, so here we are. Please chime in if you have any questions or concerns. A little more background. I mainly looked at running XGBoost on Spark using GPUs. Preliminary results have shown that there is potential for significant speedup in training time. This seems like a popular use case for Spark. In any event, it'd be nice for Spark to have better support for GPUs. Building gpu-enabled docker images seems like a useful first step. Thanks, Rong