Re: Publishing official docker images for KubernetesSchedulerBackend

Erik Erlandson Thu, 14 Dec 2017 18:56:15 -0800

Currently the containers are based off alpine, which pulls in BSD2 and MIT
licensing:
https://github.com/apache/spark/pull/19717#discussion_r154502824


to the best of my understanding, neither of those poses a problem.  If we
based the image off of centos I'd also expect the licensing of any image
deps to be compatible.

On Thu, Dec 14, 2017 at 7:19 PM, Mark Hamstra <m...@clearstorydata.com>
wrote:

> What licensing issues come into play?
>
> On Thu, Dec 14, 2017 at 4:00 PM, Erik Erlandson <eerla...@redhat.com>
> wrote:
>
>> We've been discussing the topic of container images a bit more.  The
>> kubernetes back-end operates by executing some specific CMD and ENTRYPOINT
>> logic, which is different than mesos, and which is probably not practical
>> to unify at this level.
>>
>> However: These CMD and ENTRYPOINT configurations are essentially just a
>> thin skin on top of an image which is just an install of a spark distro.
>> We feel that a single "spark-base" image should be publishable, that is
>> consumable by kube-spark images, and mesos-spark images, and likely any
>> other community image whose primary purpose is running spark components.
>> The kube-specific dockerfiles would be written "FROM spark-base" and just
>> add the small command and entrypoint layers.  Likewise, the mesos images
>> could add any specialization layers that are necessary on top of the
>> "spark-base" image.
>>
>> Does this factorization sound reasonable to others?
>> Cheers,
>> Erik
>>
>>
>> On Wed, Nov 29, 2017 at 10:04 AM, Mridul Muralidharan <mri...@gmail.com>
>> wrote:
>>
>>> We do support running on Apache Mesos via docker images - so this
>>> would not be restricted to k8s.
>>> But unlike mesos support, which has other modes of running, I believe
>>> k8s support more heavily depends on availability of docker images.
>>>
>>>
>>> Regards,
>>> Mridul
>>>
>>>
>>> On Wed, Nov 29, 2017 at 8:56 AM, Sean Owen <so...@cloudera.com> wrote:
>>> > Would it be logical to provide Docker-based distributions of other
>>> pieces of
>>> > Spark? or is this specific to K8S?
>>> > The problem is we wouldn't generally also provide a distribution of
>>> Spark
>>> > for the reasons you give, because if that, then why not RPMs and so on.
>>> >
>>> > On Wed, Nov 29, 2017 at 10:41 AM Anirudh Ramanathan <
>>> ramanath...@google.com>
>>> > wrote:
>>> >>
>>> >> In this context, I think the docker images are similar to the binaries
>>> >> rather than an extension.
>>> >> It's packaging the compiled distribution to save people the effort of
>>> >> building one themselves, akin to binaries or the python package.
>>> >>
>>> >> For reference, this is the base dockerfile for the main image that we
>>> >> intend to publish. It's not particularly complicated.
>>> >> The driver and executor images are based on said base image and only
>>> >> customize the CMD (any file/directory inclusions are extraneous and
>>> will be
>>> >> removed).
>>> >>
>>> >> Is there only one way to build it? That's a bit harder to reason
>>> about.
>>> >> The base image I'd argue is likely going to always be built that way.
>>> The
>>> >> driver and executor images, there may be cases where people want to
>>> >> customize it - (like putting all dependencies into it for example).
>>> >> In those cases, as long as our images are bare bones, they can use the
>>> >> spark-driver/spark-executor images we publish as the base, and build
>>> their
>>> >> customization as a layer on top of it.
>>> >>
>>> >> I think the composability of docker images, makes this a bit different
>>> >> from say - debian packages.
>>> >> We can publish canonical images that serve as both - a complete image
>>> for
>>> >> most Spark applications, as well as a stable substrate to build
>>> >> customization upon.
>>> >>
>>> >> On Wed, Nov 29, 2017 at 7:38 AM, Mark Hamstra <
>>> m...@clearstorydata.com>
>>> >> wrote:
>>> >>>
>>> >>> It's probably also worth considering whether there is only one,
>>> >>> well-defined, correct way to create such an image or whether this is
>>> a
>>> >>> reasonable avenue for customization. Part of why we don't do
>>> something like
>>> >>> maintain and publish canonical Debian packages for Spark is because
>>> >>> different organizations doing packaging and distribution of
>>> infrastructures
>>> >>> or operating systems can reasonably want to do this in a custom (or
>>> >>> non-customary) way. If there is really only one reasonable way to do
>>> a
>>> >>> docker image, then my bias starts to tend more toward the Spark PMC
>>> taking
>>> >>> on the responsibility to maintain and publish that image. If there
>>> is more
>>> >>> than one way to do it and publishing a particular image is more just
>>> a
>>> >>> convenience, then my bias tends more away from maintaining and
>>> publish it.
>>> >>>
>>> >>> On Wed, Nov 29, 2017 at 5:14 AM, Sean Owen <so...@cloudera.com>
>>> wrote:
>>> >>>>
>>> >>>> Source code is the primary release; compiled binary releases are
>>> >>>> conveniences that are also released. A docker image sounds fairly
>>> different
>>> >>>> though. To the extent it's the standard delivery mechanism for some
>>> artifact
>>> >>>> (think: pyspark on PyPI as well) that makes sense, but is that the
>>> >>>> situation? if it's more of an extension or alternate presentation
>>> of Spark
>>> >>>> components, that typically wouldn't be part of a Spark release. The
>>> ones the
>>> >>>> PMC takes responsibility for maintaining ought to be the core,
>>> critical
>>> >>>> means of distribution alone.
>>> >>>>
>>> >>>> On Wed, Nov 29, 2017 at 2:52 AM Anirudh Ramanathan
>>> >>>> <ramanath...@google.com.invalid> wrote:
>>> >>>>>
>>> >>>>> Hi all,
>>> >>>>>
>>> >>>>> We're all working towards the Kubernetes scheduler backend (full
>>> steam
>>> >>>>> ahead!) that's targeted towards Spark 2.3. One of the questions
>>> that comes
>>> >>>>> up often is docker images.
>>> >>>>>
>>> >>>>> While we're making available dockerfiles to allow people to create
>>> >>>>> their own docker images from source, ideally, we'd want to publish
>>> official
>>> >>>>> docker images as part of the release process.
>>> >>>>>
>>> >>>>> I understand that the ASF has procedure around this, and we would
>>> want
>>> >>>>> to get that started to help us get these artifacts published by
>>> 2.3. I'd
>>> >>>>> love to get a discussion around this started, and the thoughts of
>>> the
>>> >>>>> community regarding this.
>>> >>>>>
>>> >>>>> --
>>> >>>>> Thanks,
>>> >>>>> Anirudh Ramanathan
>>> >>>
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Anirudh Ramanathan
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>
>

Re: Publishing official docker images for KubernetesSchedulerBackend

Reply via email to