Would it be logical to provide Docker-based distributions of other pieces of Spark? or is this specific to K8S? The problem is we wouldn't generally also provide a distribution of Spark for the reasons you give, because if that, then why not RPMs and so on.
On Wed, Nov 29, 2017 at 10:41 AM Anirudh Ramanathan <ramanath...@google.com> wrote: > In this context, I think the docker images are similar to the binaries > rather than an extension. > It's packaging the compiled distribution to save people the effort of > building one themselves, akin to binaries or the python package. > > For reference, this is the base dockerfile > <https://github.com/apache-spark-on-k8s/spark/tree/branch-2.2-kubernetes/resource-managers/kubernetes/docker-minimal-bundle/src/main/docker/spark-base> > for > the main image that we intend to publish. It's not particularly > complicated. > The driver > <https://github.com/apache-spark-on-k8s/spark/blob/branch-2.2-kubernetes/resource-managers/kubernetes/docker-minimal-bundle/src/main/docker/driver/Dockerfile> > and executor > <https://github.com/apache-spark-on-k8s/spark/blob/branch-2.2-kubernetes/resource-managers/kubernetes/docker-minimal-bundle/src/main/docker/executor/Dockerfile> > images are based on said base image and only customize the CMD (any > file/directory inclusions are extraneous and will be removed). > > Is there only one way to build it? That's a bit harder to reason about. > The base image I'd argue is likely going to always be built that way. The > driver and executor images, there may be cases where people want to > customize it - (like putting all dependencies into it for example). > In those cases, as long as our images are bare bones, they can use the > spark-driver/spark-executor images we publish as the base, and build their > customization as a layer on top of it. > > I think the composability of docker images, makes this a bit different > from say - debian packages. > We can publish canonical images that serve as both - a complete image for > most Spark applications, as well as a stable substrate to build > customization upon. > > On Wed, Nov 29, 2017 at 7:38 AM, Mark Hamstra <m...@clearstorydata.com> > wrote: > >> It's probably also worth considering whether there is only one, >> well-defined, correct way to create such an image or whether this is a >> reasonable avenue for customization. Part of why we don't do something like >> maintain and publish canonical Debian packages for Spark is because >> different organizations doing packaging and distribution of infrastructures >> or operating systems can reasonably want to do this in a custom (or >> non-customary) way. If there is really only one reasonable way to do a >> docker image, then my bias starts to tend more toward the Spark PMC taking >> on the responsibility to maintain and publish that image. If there is more >> than one way to do it and publishing a particular image is more just a >> convenience, then my bias tends more away from maintaining and publish it. >> >> On Wed, Nov 29, 2017 at 5:14 AM, Sean Owen <so...@cloudera.com> wrote: >> >>> Source code is the primary release; compiled binary releases are >>> conveniences that are also released. A docker image sounds fairly different >>> though. To the extent it's the standard delivery mechanism for some >>> artifact (think: pyspark on PyPI as well) that makes sense, but is that the >>> situation? if it's more of an extension or alternate presentation of Spark >>> components, that typically wouldn't be part of a Spark release. The ones >>> the PMC takes responsibility for maintaining ought to be the core, critical >>> means of distribution alone. >>> >>> On Wed, Nov 29, 2017 at 2:52 AM Anirudh Ramanathan < >>> ramanath...@google.com.invalid> wrote: >>> >>>> Hi all, >>>> >>>> We're all working towards the Kubernetes scheduler backend (full steam >>>> ahead!) that's targeted towards Spark 2.3. One of the questions that comes >>>> up often is docker images. >>>> >>>> While we're making available dockerfiles to allow people to create >>>> their own docker images from source, ideally, we'd want to publish official >>>> docker images as part of the release process. >>>> >>>> I understand that the ASF has procedure around this, and we would want >>>> to get that started to help us get these artifacts published by 2.3. I'd >>>> love to get a discussion around this started, and the thoughts of the >>>> community regarding this. >>>> >>>> -- >>>> Thanks, >>>> Anirudh Ramanathan >>>> >>> >> > > > -- > Anirudh Ramanathan >