Would it be logical to provide Docker-based distributions of other pieces
of Spark? or is this specific to K8S?
The problem is we wouldn't generally also provide a distribution of Spark
for the reasons you give, because if that, then why not RPMs and so on.

On Wed, Nov 29, 2017 at 10:41 AM Anirudh Ramanathan <ramanath...@google.com>
wrote:

> In this context, I think the docker images are similar to the binaries
> rather than an extension.
> It's packaging the compiled distribution to save people the effort of
> building one themselves, akin to binaries or the python package.
>
> For reference, this is the base dockerfile
> <https://github.com/apache-spark-on-k8s/spark/tree/branch-2.2-kubernetes/resource-managers/kubernetes/docker-minimal-bundle/src/main/docker/spark-base>
>  for
> the main image that we intend to publish. It's not particularly
> complicated.
> The driver
> <https://github.com/apache-spark-on-k8s/spark/blob/branch-2.2-kubernetes/resource-managers/kubernetes/docker-minimal-bundle/src/main/docker/driver/Dockerfile>
> and executor
> <https://github.com/apache-spark-on-k8s/spark/blob/branch-2.2-kubernetes/resource-managers/kubernetes/docker-minimal-bundle/src/main/docker/executor/Dockerfile>
> images are based on said base image and only customize the CMD (any
> file/directory inclusions are extraneous and will be removed).
>
> Is there only one way to build it? That's a bit harder to reason about.
> The base image I'd argue is likely going to always be built that way. The
> driver and executor images, there may be cases where people want to
> customize it - (like putting all dependencies into it for example).
> In those cases, as long as our images are bare bones, they can use the
> spark-driver/spark-executor images we publish as the base, and build their
> customization as a layer on top of it.
>
> I think the composability of docker images, makes this a bit different
> from say - debian packages.
> We can publish canonical images that serve as both - a complete image for
> most Spark applications, as well as a stable substrate to build
> customization upon.
>
> On Wed, Nov 29, 2017 at 7:38 AM, Mark Hamstra <m...@clearstorydata.com>
> wrote:
>
>> It's probably also worth considering whether there is only one,
>> well-defined, correct way to create such an image or whether this is a
>> reasonable avenue for customization. Part of why we don't do something like
>> maintain and publish canonical Debian packages for Spark is because
>> different organizations doing packaging and distribution of infrastructures
>> or operating systems can reasonably want to do this in a custom (or
>> non-customary) way. If there is really only one reasonable way to do a
>> docker image, then my bias starts to tend more toward the Spark PMC taking
>> on the responsibility to maintain and publish that image. If there is more
>> than one way to do it and publishing a particular image is more just a
>> convenience, then my bias tends more away from maintaining and publish it.
>>
>> On Wed, Nov 29, 2017 at 5:14 AM, Sean Owen <so...@cloudera.com> wrote:
>>
>>> Source code is the primary release; compiled binary releases are
>>> conveniences that are also released. A docker image sounds fairly different
>>> though. To the extent it's the standard delivery mechanism for some
>>> artifact (think: pyspark on PyPI as well) that makes sense, but is that the
>>> situation? if it's more of an extension or alternate presentation of Spark
>>> components, that typically wouldn't be part of a Spark release. The ones
>>> the PMC takes responsibility for maintaining ought to be the core, critical
>>> means of distribution alone.
>>>
>>> On Wed, Nov 29, 2017 at 2:52 AM Anirudh Ramanathan <
>>> ramanath...@google.com.invalid> wrote:
>>>
>>>> Hi all,
>>>>
>>>> We're all working towards the Kubernetes scheduler backend (full steam
>>>> ahead!) that's targeted towards Spark 2.3. One of the questions that comes
>>>> up often is docker images.
>>>>
>>>> While we're making available dockerfiles to allow people to create
>>>> their own docker images from source, ideally, we'd want to publish official
>>>> docker images as part of the release process.
>>>>
>>>> I understand that the ASF has procedure around this, and we would want
>>>> to get that started to help us get these artifacts published by 2.3. I'd
>>>> love to get a discussion around this started, and the thoughts of the
>>>> community regarding this.
>>>>
>>>> --
>>>> Thanks,
>>>> Anirudh Ramanathan
>>>>
>>>
>>
>
>
> --
> Anirudh Ramanathan
>

Reply via email to