Currently the containers are based off alpine, which pulls in BSD2 and MIT licensing: https://github.com/apache/spark/pull/19717#discussion_r154502824
to the best of my understanding, neither of those poses a problem. If we based the image off of centos I'd also expect the licensing of any image deps to be compatible. On Thu, Dec 14, 2017 at 7:19 PM, Mark Hamstra <m...@clearstorydata.com> wrote: > What licensing issues come into play? > > On Thu, Dec 14, 2017 at 4:00 PM, Erik Erlandson <eerla...@redhat.com> > wrote: > >> We've been discussing the topic of container images a bit more. The >> kubernetes back-end operates by executing some specific CMD and ENTRYPOINT >> logic, which is different than mesos, and which is probably not practical >> to unify at this level. >> >> However: These CMD and ENTRYPOINT configurations are essentially just a >> thin skin on top of an image which is just an install of a spark distro. >> We feel that a single "spark-base" image should be publishable, that is >> consumable by kube-spark images, and mesos-spark images, and likely any >> other community image whose primary purpose is running spark components. >> The kube-specific dockerfiles would be written "FROM spark-base" and just >> add the small command and entrypoint layers. Likewise, the mesos images >> could add any specialization layers that are necessary on top of the >> "spark-base" image. >> >> Does this factorization sound reasonable to others? >> Cheers, >> Erik >> >> >> On Wed, Nov 29, 2017 at 10:04 AM, Mridul Muralidharan <mri...@gmail.com> >> wrote: >> >>> We do support running on Apache Mesos via docker images - so this >>> would not be restricted to k8s. >>> But unlike mesos support, which has other modes of running, I believe >>> k8s support more heavily depends on availability of docker images. >>> >>> >>> Regards, >>> Mridul >>> >>> >>> On Wed, Nov 29, 2017 at 8:56 AM, Sean Owen <so...@cloudera.com> wrote: >>> > Would it be logical to provide Docker-based distributions of other >>> pieces of >>> > Spark? or is this specific to K8S? >>> > The problem is we wouldn't generally also provide a distribution of >>> Spark >>> > for the reasons you give, because if that, then why not RPMs and so on. >>> > >>> > On Wed, Nov 29, 2017 at 10:41 AM Anirudh Ramanathan < >>> ramanath...@google.com> >>> > wrote: >>> >> >>> >> In this context, I think the docker images are similar to the binaries >>> >> rather than an extension. >>> >> It's packaging the compiled distribution to save people the effort of >>> >> building one themselves, akin to binaries or the python package. >>> >> >>> >> For reference, this is the base dockerfile for the main image that we >>> >> intend to publish. It's not particularly complicated. >>> >> The driver and executor images are based on said base image and only >>> >> customize the CMD (any file/directory inclusions are extraneous and >>> will be >>> >> removed). >>> >> >>> >> Is there only one way to build it? That's a bit harder to reason >>> about. >>> >> The base image I'd argue is likely going to always be built that way. >>> The >>> >> driver and executor images, there may be cases where people want to >>> >> customize it - (like putting all dependencies into it for example). >>> >> In those cases, as long as our images are bare bones, they can use the >>> >> spark-driver/spark-executor images we publish as the base, and build >>> their >>> >> customization as a layer on top of it. >>> >> >>> >> I think the composability of docker images, makes this a bit different >>> >> from say - debian packages. >>> >> We can publish canonical images that serve as both - a complete image >>> for >>> >> most Spark applications, as well as a stable substrate to build >>> >> customization upon. >>> >> >>> >> On Wed, Nov 29, 2017 at 7:38 AM, Mark Hamstra < >>> m...@clearstorydata.com> >>> >> wrote: >>> >>> >>> >>> It's probably also worth considering whether there is only one, >>> >>> well-defined, correct way to create such an image or whether this is >>> a >>> >>> reasonable avenue for customization. Part of why we don't do >>> something like >>> >>> maintain and publish canonical Debian packages for Spark is because >>> >>> different organizations doing packaging and distribution of >>> infrastructures >>> >>> or operating systems can reasonably want to do this in a custom (or >>> >>> non-customary) way. If there is really only one reasonable way to do >>> a >>> >>> docker image, then my bias starts to tend more toward the Spark PMC >>> taking >>> >>> on the responsibility to maintain and publish that image. If there >>> is more >>> >>> than one way to do it and publishing a particular image is more just >>> a >>> >>> convenience, then my bias tends more away from maintaining and >>> publish it. >>> >>> >>> >>> On Wed, Nov 29, 2017 at 5:14 AM, Sean Owen <so...@cloudera.com> >>> wrote: >>> >>>> >>> >>>> Source code is the primary release; compiled binary releases are >>> >>>> conveniences that are also released. A docker image sounds fairly >>> different >>> >>>> though. To the extent it's the standard delivery mechanism for some >>> artifact >>> >>>> (think: pyspark on PyPI as well) that makes sense, but is that the >>> >>>> situation? if it's more of an extension or alternate presentation >>> of Spark >>> >>>> components, that typically wouldn't be part of a Spark release. The >>> ones the >>> >>>> PMC takes responsibility for maintaining ought to be the core, >>> critical >>> >>>> means of distribution alone. >>> >>>> >>> >>>> On Wed, Nov 29, 2017 at 2:52 AM Anirudh Ramanathan >>> >>>> <ramanath...@google.com.invalid> wrote: >>> >>>>> >>> >>>>> Hi all, >>> >>>>> >>> >>>>> We're all working towards the Kubernetes scheduler backend (full >>> steam >>> >>>>> ahead!) that's targeted towards Spark 2.3. One of the questions >>> that comes >>> >>>>> up often is docker images. >>> >>>>> >>> >>>>> While we're making available dockerfiles to allow people to create >>> >>>>> their own docker images from source, ideally, we'd want to publish >>> official >>> >>>>> docker images as part of the release process. >>> >>>>> >>> >>>>> I understand that the ASF has procedure around this, and we would >>> want >>> >>>>> to get that started to help us get these artifacts published by >>> 2.3. I'd >>> >>>>> love to get a discussion around this started, and the thoughts of >>> the >>> >>>>> community regarding this. >>> >>>>> >>> >>>>> -- >>> >>>>> Thanks, >>> >>>>> Anirudh Ramanathan >>> >>> >>> >>> >>> >> >>> >> >>> >> >>> >> -- >>> >> Anirudh Ramanathan >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>> >> >