Re: Kubernetes backend and docker images
Think we can allow for different images and default to them being the same. Apologize if I missed that as being the original intention though. -Matt Cheah On 1/8/18, 1:45 PM, "Marcelo Vanzin"wrote: On Mon, Jan 8, 2018 at 1:39 PM, Matt Cheah wrote: > We would still want images to be able to be uniquely specified for the > driver vs. the executors. For example, not all of the libraries required on > the driver may be required on the executors, so the user would want to > specify a different custom driver image from their custom executor image. Are you saying that we should *require* different images for driver and executor, as is the case today, or that we should *allow* different images, but default to the same, as I'm proposing? I see zero reason to require different images. While it's true that the driver may need more libraries than the executor, 99% of the time it's ok to just have those libraries everywhere - it makes configuration easier and doesn't do any harm. -- Marcelo smime.p7s Description: S/MIME cryptographic signature
Re: Kubernetes backend and docker images
On Mon, Jan 8, 2018 at 1:39 PM, Matt Cheahwrote: > We would still want images to be able to be uniquely specified for the > driver vs. the executors. For example, not all of the libraries required on > the driver may be required on the executors, so the user would want to > specify a different custom driver image from their custom executor image. Are you saying that we should *require* different images for driver and executor, as is the case today, or that we should *allow* different images, but default to the same, as I'm proposing? I see zero reason to require different images. While it's true that the driver may need more libraries than the executor, 99% of the time it's ok to just have those libraries everywhere - it makes configuration easier and doesn't do any harm. -- Marcelo - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Kubernetes backend and docker images
// Fixing Anirudh's email address From: Matt Cheah Sent: Monday, January 8, 2018 1:39:12 PM To: Anirudh Ramanathan; Felix Cheung Cc: 蒋星博; Marcelo Vanzin; dev; Timothy Chen Subject: Re: Kubernetes backend and docker images We would still want images to be able to be uniquely specified for the driver vs. the executors. For example, not all of the libraries required on the driver may be required on the executors, so the user would want to specify a different custom driver image from their custom executor image. But the idea of the entry point script that can switch based on environment variables makes sense. I do think we want separate Python and R images, because Python and R come with non-trivial extra baggage that can make the images a lot bigger and slower to download for Scala-only users. From: Anirudh Ramanathan <ramanath...@google.com.INVALID> Date: Monday, January 8, 2018 at 9:48 AM To: Felix Cheung <felixcheun...@hotmail.com> Cc: 蒋星博 <jiangxb1...@gmail.com>, Marcelo Vanzin <van...@cloudera.com>, dev <dev@spark.apache.org>, Matt Cheah <mch...@palantir.com>, Timothy Chen <tnac...@gmail.com> Subject: Re: Kubernetes backend and docker images +matt +tim For reference - here's our previous thread on this dockerfile unification problem - https://github.com/apache-spark-on-k8s/spark/pull/60[github.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache-2Dspark-2Don-2Dk8s_spark_pull_60=DwMFaQ=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8=hzwIMNQ9E99EMYGuqHI0kXhVbvX3nU3OSDadUnJxjAs=p4Uw1HnAlReB9Az1dDlMHQHQnxXaWSTUkndFQhaTLrc=Q-Svbf-gRJmvuxWzSjjq5ZZZjJmoTaGkmPNaLQVKZzQ=> I think this approach should be acceptable from both the customization and visibility perspectives. On Mon, Jan 8, 2018 at 9:40 AM, Anirudh Ramanathan <ramanath...@google.com<mailto:ramanath...@google.com>> wrote: +1 We discussed some alternatives early on - including using a single dockerfile and different spec.container.command and spec.container.args from the Kubernetes driver/executor specification (which override entrypoint in docker). No reason that won't work also - except that it reduced the transparency of what was being invoked in the driver/executor/init by hiding it in the actual backend code. Putting it into a single entrypoint file and branching let's us realize the best of both worlds I think. This is an elegant solution, thanks Marcelo. On Jan 6, 2018 10:01 AM, "Felix Cheung" <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>> wrote: +1 Thanks for taking on this. That was my feedback on one of the long comment thread as well, I think we should have one docker image instead of 3 (also pending in the fork are python and R variant, we should consider having one that we official release instead of 9, for example) From: 蒋星博 <jiangxb1...@gmail.com<mailto:jiangxb1...@gmail.com>> Sent: Friday, January 5, 2018 10:57:53 PM To: Marcelo Vanzin Cc: dev Subject: Re: Kubernetes backend and docker images Agree it should be nice to have this simplification, and users can still create their custom images by copy/modifying the default one. Thanks for bring this out Marcelo! 2018-01-05 17:06 GMT-08:00 Marcelo Vanzin <van...@cloudera.com<mailto:van...@cloudera.com>>: Hey all, especially those working on the k8s stuff. Currently we have 3 docker images that need to be built and provided by the user when starting a Spark app: driver, executor, and init container. When the initial review went by, I asked why do we need 3, and I was told that's because they have different entry points. That never really convinced me, but well, everybody wanted to get things in to get the ball rolling. But I still think that's not the best way to go. I did some pretty simple hacking and got things to work with a single image: https://github.com/vanzin/spark/commit/k8s-img[github.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_vanzin_spark_commit_k8s-2Dimg=DwMFaQ=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8=hzwIMNQ9E99EMYGuqHI0kXhVbvX3nU3OSDadUnJxjAs=p4Uw1HnAlReB9Az1dDlMHQHQnxXaWSTUkndFQhaTLrc=I6UykB4OI_29gnvRoaKahiOi3jaSF-LEkLJ37EcrCp8=> Is there a reason why that approach would not work? You could still create separate images for driver and executor if wanted, but there's no reason I can see why we should need 3 images for the simple case. Note that the code there can be cleaned up still, and I don't love the idea of using env variables to propagate arguments to the container, but that works for now. -- Marcelo - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org<mailto:dev-unsubscr...@spark.apache.org> -- Anirudh Ramanathan
Re: Kubernetes backend and docker images
We would still want images to be able to be uniquely specified for the driver vs. the executors. For example, not all of the libraries required on the driver may be required on the executors, so the user would want to specify a different custom driver image from their custom executor image. But the idea of the entry point script that can switch based on environment variables makes sense. I do think we want separate Python and R images, because Python and R come with non-trivial extra baggage that can make the images a lot bigger and slower to download for Scala-only users. From: Anirudh Ramanathan <ramanath...@google.com.INVALID> Date: Monday, January 8, 2018 at 9:48 AM To: Felix Cheung <felixcheun...@hotmail.com> Cc: 蒋星博 <jiangxb1...@gmail.com>, Marcelo Vanzin <van...@cloudera.com>, dev <dev@spark.apache.org>, Matt Cheah <mch...@palantir.com>, Timothy Chen <tnac...@gmail.com> Subject: Re: Kubernetes backend and docker images +matt +tim For reference - here's our previous thread on this dockerfile unification problem - https://github.com/apache-spark-on-k8s/spark/pull/60[github.com] I think this approach should be acceptable from both the customization and visibility perspectives. On Mon, Jan 8, 2018 at 9:40 AM, Anirudh Ramanathan <ramanath...@google.com> wrote: +1 We discussed some alternatives early on - including using a single dockerfile and different spec.container.command and spec.container.args from the Kubernetes driver/executor specification (which override entrypoint in docker). No reason that won't work also - except that it reduced the transparency of what was being invoked in the driver/executor/init by hiding it in the actual backend code. Putting it into a single entrypoint file and branching let's us realize the best of both worlds I think. This is an elegant solution, thanks Marcelo. On Jan 6, 2018 10:01 AM, "Felix Cheung" <felixcheun...@hotmail.com> wrote: +1 Thanks for taking on this. That was my feedback on one of the long comment thread as well, I think we should have one docker image instead of 3 (also pending in the fork are python and R variant, we should consider having one that we official release instead of 9, for example) From: 蒋星博 <jiangxb1...@gmail.com> Sent: Friday, January 5, 2018 10:57:53 PM To: Marcelo Vanzin Cc: dev Subject: Re: Kubernetes backend and docker images Agree it should be nice to have this simplification, and users can still create their custom images by copy/modifying the default one. Thanks for bring this out Marcelo! 2018-01-05 17:06 GMT-08:00 Marcelo Vanzin <van...@cloudera.com>: Hey all, especially those working on the k8s stuff. Currently we have 3 docker images that need to be built and provided by the user when starting a Spark app: driver, executor, and init container. When the initial review went by, I asked why do we need 3, and I was told that's because they have different entry points. That never really convinced me, but well, everybody wanted to get things in to get the ball rolling. But I still think that's not the best way to go. I did some pretty simple hacking and got things to work with a single image: https://github.com/vanzin/spark/commit/k8s-img[github.com] Is there a reason why that approach would not work? You could still create separate images for driver and executor if wanted, but there's no reason I can see why we should need 3 images for the simple case. Note that the code there can be cleaned up still, and I don't love the idea of using env variables to propagate arguments to the container, but that works for now. -- Marcelo - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org -- Anirudh Ramanathan smime.p7s Description: S/MIME cryptographic signature
Re: Kubernetes backend and docker images
+matt +tim For reference - here's our previous thread on this dockerfile unification problem - https://github.com/apache-spark-on-k8s/spark/pull/60 I think this approach should be acceptable from both the customization and visibility perspectives. On Mon, Jan 8, 2018 at 9:40 AM, Anirudh Ramanathan <ramanath...@google.com> wrote: > +1 > > We discussed some alternatives early on - including using a single > dockerfile and different spec.container.command and spec.container.args > from the Kubernetes driver/executor specification (which override > entrypoint in docker). No reason that won't work also - except that it > reduced the transparency of what was being invoked in the > driver/executor/init by hiding it in the actual backend code. > > Putting it into a single entrypoint file and branching let's us realize > the best of both worlds I think. This is an elegant solution, thanks > Marcelo. > > On Jan 6, 2018 10:01 AM, "Felix Cheung" <felixcheun...@hotmail.com> wrote: > >> +1 >> >> Thanks for taking on this. >> That was my feedback on one of the long comment thread as well, I think >> we should have one docker image instead of 3 (also pending in the fork are >> python and R variant, we should consider having one that we official >> release instead of 9, for example) >> >> >> -- >> *From:* 蒋星博 <jiangxb1...@gmail.com> >> *Sent:* Friday, January 5, 2018 10:57:53 PM >> *To:* Marcelo Vanzin >> *Cc:* dev >> *Subject:* Re: Kubernetes backend and docker images >> >> Agree it should be nice to have this simplification, and users can still >> create their custom images by copy/modifying the default one. >> Thanks for bring this out Marcelo! >> >> 2018-01-05 17:06 GMT-08:00 Marcelo Vanzin <van...@cloudera.com>: >> >>> Hey all, especially those working on the k8s stuff. >>> >>> Currently we have 3 docker images that need to be built and provided >>> by the user when starting a Spark app: driver, executor, and init >>> container. >>> >>> When the initial review went by, I asked why do we need 3, and I was >>> told that's because they have different entry points. That never >>> really convinced me, but well, everybody wanted to get things in to >>> get the ball rolling. >>> >>> But I still think that's not the best way to go. I did some pretty >>> simple hacking and got things to work with a single image: >>> >>> https://github.com/vanzin/spark/commit/k8s-img >>> >>> Is there a reason why that approach would not work? You could still >>> create separate images for driver and executor if wanted, but there's >>> no reason I can see why we should need 3 images for the simple case. >>> >>> Note that the code there can be cleaned up still, and I don't love the >>> idea of using env variables to propagate arguments to the container, >>> but that works for now. >>> >>> -- >>> Marcelo >>> >>> - >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>> >> -- Anirudh Ramanathan
Re: Kubernetes backend and docker images
+1 We discussed some alternatives early on - including using a single dockerfile and different spec.container.command and spec.container.args from the Kubernetes driver/executor specification (which override entrypoint in docker). No reason that won't work also - except that it reduced the transparency of what was being invoked in the driver/executor/init by hiding it in the actual backend code. Putting it into a single entrypoint file and branching let's us realize the best of both worlds I think. This is an elegant solution, thanks Marcelo. On Jan 6, 2018 10:01 AM, "Felix Cheung" <felixcheun...@hotmail.com> wrote: > +1 > > Thanks for taking on this. > That was my feedback on one of the long comment thread as well, I think we > should have one docker image instead of 3 (also pending in the fork are > python and R variant, we should consider having one that we official > release instead of 9, for example) > > > -- > *From:* 蒋星博 <jiangxb1...@gmail.com> > *Sent:* Friday, January 5, 2018 10:57:53 PM > *To:* Marcelo Vanzin > *Cc:* dev > *Subject:* Re: Kubernetes backend and docker images > > Agree it should be nice to have this simplification, and users can still > create their custom images by copy/modifying the default one. > Thanks for bring this out Marcelo! > > 2018-01-05 17:06 GMT-08:00 Marcelo Vanzin <van...@cloudera.com>: > >> Hey all, especially those working on the k8s stuff. >> >> Currently we have 3 docker images that need to be built and provided >> by the user when starting a Spark app: driver, executor, and init >> container. >> >> When the initial review went by, I asked why do we need 3, and I was >> told that's because they have different entry points. That never >> really convinced me, but well, everybody wanted to get things in to >> get the ball rolling. >> >> But I still think that's not the best way to go. I did some pretty >> simple hacking and got things to work with a single image: >> >> https://github.com/vanzin/spark/commit/k8s-img >> >> Is there a reason why that approach would not work? You could still >> create separate images for driver and executor if wanted, but there's >> no reason I can see why we should need 3 images for the simple case. >> >> Note that the code there can be cleaned up still, and I don't love the >> idea of using env variables to propagate arguments to the container, >> but that works for now. >> >> -- >> Marcelo >> >> - >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> >
Re: Kubernetes backend and docker images
+1 Thanks for taking on this. That was my feedback on one of the long comment thread as well, I think we should have one docker image instead of 3 (also pending in the fork are python and R variant, we should consider having one that we official release instead of 9, for example) From: 蒋星博 <jiangxb1...@gmail.com> Sent: Friday, January 5, 2018 10:57:53 PM To: Marcelo Vanzin Cc: dev Subject: Re: Kubernetes backend and docker images Agree it should be nice to have this simplification, and users can still create their custom images by copy/modifying the default one. Thanks for bring this out Marcelo! 2018-01-05 17:06 GMT-08:00 Marcelo Vanzin <van...@cloudera.com<mailto:van...@cloudera.com>>: Hey all, especially those working on the k8s stuff. Currently we have 3 docker images that need to be built and provided by the user when starting a Spark app: driver, executor, and init container. When the initial review went by, I asked why do we need 3, and I was told that's because they have different entry points. That never really convinced me, but well, everybody wanted to get things in to get the ball rolling. But I still think that's not the best way to go. I did some pretty simple hacking and got things to work with a single image: https://github.com/vanzin/spark/commit/k8s-img Is there a reason why that approach would not work? You could still create separate images for driver and executor if wanted, but there's no reason I can see why we should need 3 images for the simple case. Note that the code there can be cleaned up still, and I don't love the idea of using env variables to propagate arguments to the container, but that works for now. -- Marcelo - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org<mailto:dev-unsubscr...@spark.apache.org>
Re: Kubernetes backend and docker images
Agree it should be nice to have this simplification, and users can still create their custom images by copy/modifying the default one. Thanks for bring this out Marcelo! 2018-01-05 17:06 GMT-08:00 Marcelo Vanzin: > Hey all, especially those working on the k8s stuff. > > Currently we have 3 docker images that need to be built and provided > by the user when starting a Spark app: driver, executor, and init > container. > > When the initial review went by, I asked why do we need 3, and I was > told that's because they have different entry points. That never > really convinced me, but well, everybody wanted to get things in to > get the ball rolling. > > But I still think that's not the best way to go. I did some pretty > simple hacking and got things to work with a single image: > > https://github.com/vanzin/spark/commit/k8s-img > > Is there a reason why that approach would not work? You could still > create separate images for driver and executor if wanted, but there's > no reason I can see why we should need 3 images for the simple case. > > Note that the code there can be cleaned up still, and I don't love the > idea of using env variables to propagate arguments to the container, > but that works for now. > > -- > Marcelo > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >
Re: Kubernetes backend and docker images
Awesome, less is better Mridul Muralidharan于2018年1月6日周六 上午11:54写道: > > We should definitely clean this up and make it the default, nicely done > Marcelo ! > > Thanks, > Mridul > > On Fri, Jan 5, 2018 at 5:06 PM Marcelo Vanzin wrote: > >> Hey all, especially those working on the k8s stuff. >> >> Currently we have 3 docker images that need to be built and provided >> by the user when starting a Spark app: driver, executor, and init >> container. >> >> When the initial review went by, I asked why do we need 3, and I was >> told that's because they have different entry points. That never >> really convinced me, but well, everybody wanted to get things in to >> get the ball rolling. >> >> But I still think that's not the best way to go. I did some pretty >> simple hacking and got things to work with a single image: >> >> https://github.com/vanzin/spark/commit/k8s-img >> >> Is there a reason why that approach would not work? You could still >> create separate images for driver and executor if wanted, but there's >> no reason I can see why we should need 3 images for the simple case. >> >> Note that the code there can be cleaned up still, and I don't love the >> idea of using env variables to propagate arguments to the container, >> but that works for now. >> >> -- >> Marcelo >> >> - >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >>
Re: Kubernetes backend and docker images
We should definitely clean this up and make it the default, nicely done Marcelo ! Thanks, Mridul On Fri, Jan 5, 2018 at 5:06 PM Marcelo Vanzinwrote: > Hey all, especially those working on the k8s stuff. > > Currently we have 3 docker images that need to be built and provided > by the user when starting a Spark app: driver, executor, and init > container. > > When the initial review went by, I asked why do we need 3, and I was > told that's because they have different entry points. That never > really convinced me, but well, everybody wanted to get things in to > get the ball rolling. > > But I still think that's not the best way to go. I did some pretty > simple hacking and got things to work with a single image: > > https://github.com/vanzin/spark/commit/k8s-img > > Is there a reason why that approach would not work? You could still > create separate images for driver and executor if wanted, but there's > no reason I can see why we should need 3 images for the simple case. > > Note that the code there can be cleaned up still, and I don't love the > idea of using env variables to propagate arguments to the container, > but that works for now. > > -- > Marcelo > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >
Re: Kubernetes backend and docker images
This is neat. With some code cleanup and as long as users can still use custom driver/executor/init-container images if they want to, I think this is great to have. I don't think there's a particular reason why having a single image wouldn't work. Thanks for doing this! On Fri, Jan 5, 2018 at 5:06 PM, Marcelo Vanzinwrote: > Hey all, especially those working on the k8s stuff. > > Currently we have 3 docker images that need to be built and provided > by the user when starting a Spark app: driver, executor, and init > container. > > When the initial review went by, I asked why do we need 3, and I was > told that's because they have different entry points. That never > really convinced me, but well, everybody wanted to get things in to > get the ball rolling. > > But I still think that's not the best way to go. I did some pretty > simple hacking and got things to work with a single image: > > https://github.com/vanzin/spark/commit/k8s-img > > Is there a reason why that approach would not work? You could still > create separate images for driver and executor if wanted, but there's > no reason I can see why we should need 3 images for the simple case. > > Note that the code there can be cleaned up still, and I don't love the > idea of using env variables to propagate arguments to the container, > but that works for now. > > -- > Marcelo > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >
Kubernetes backend and docker images
Hey all, especially those working on the k8s stuff. Currently we have 3 docker images that need to be built and provided by the user when starting a Spark app: driver, executor, and init container. When the initial review went by, I asked why do we need 3, and I was told that's because they have different entry points. That never really convinced me, but well, everybody wanted to get things in to get the ball rolling. But I still think that's not the best way to go. I did some pretty simple hacking and got things to work with a single image: https://github.com/vanzin/spark/commit/k8s-img Is there a reason why that approach would not work? You could still create separate images for driver and executor if wanted, but there's no reason I can see why we should need 3 images for the simple case. Note that the code there can be cleaned up still, and I don't love the idea of using env variables to propagate arguments to the container, but that works for now. -- Marcelo - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org