I think that's all correct, though the license of third party dependencies is actually a difficult and sticky part. The ASF couldn't make a software release including any GPL software for example, and it's not just a matter of adding a disclaimer. Any actual bits distributed by the PMC would have to follow all the license rules.
On Tue, Dec 19, 2017 at 12:34 PM Erik Erlandson <eerla...@redhat.com> wrote: > I've been looking a bit more into ASF legal posture on licensing and > container images. What I have found indicates that ASF considers container > images to be just another variety of distribution channel. As such, it is > acceptable to publish official releases; for example an image such as > spark:v2.3.0 built from the v2.3.0 source is fine. It is not acceptable to > do something like regularly publish spark:latest built from the head of > master. > > More detail here: > https://issues.apache.org/jira/browse/LEGAL-270 > > So as I understand it, making a release-tagged public image as part of > each official release does not pose any problems. > > With respect to considering the licenses of other ancillary dependencies > that are also installed on such container images, I noticed this clause in > the legal boilerplate for the Flink images > <https://hub.docker.com/r/library/flink/>: > > As with all Docker images, these likely also contain other software which >> may be under other licenses (such as Bash, etc from the base distribution, >> along with any direct or indirect dependencies of the primary software >> being contained). >> > > So it may be sufficient to resolve this via disclaimer. > > -Erik > > On Thu, Dec 14, 2017 at 7:55 PM, Erik Erlandson <eerla...@redhat.com> > wrote: > >> Currently the containers are based off alpine, which pulls in BSD2 and >> MIT licensing: >> https://github.com/apache/spark/pull/19717#discussion_r154502824 >> >> to the best of my understanding, neither of those poses a problem. If we >> based the image off of centos I'd also expect the licensing of any image >> deps to be compatible. >> >> On Thu, Dec 14, 2017 at 7:19 PM, Mark Hamstra <m...@clearstorydata.com> >> wrote: >> >>> What licensing issues come into play? >>> >>> On Thu, Dec 14, 2017 at 4:00 PM, Erik Erlandson <eerla...@redhat.com> >>> wrote: >>> >>>> We've been discussing the topic of container images a bit more. The >>>> kubernetes back-end operates by executing some specific CMD and ENTRYPOINT >>>> logic, which is different than mesos, and which is probably not practical >>>> to unify at this level. >>>> >>>> However: These CMD and ENTRYPOINT configurations are essentially just a >>>> thin skin on top of an image which is just an install of a spark distro. >>>> We feel that a single "spark-base" image should be publishable, that is >>>> consumable by kube-spark images, and mesos-spark images, and likely any >>>> other community image whose primary purpose is running spark components. >>>> The kube-specific dockerfiles would be written "FROM spark-base" and just >>>> add the small command and entrypoint layers. Likewise, the mesos images >>>> could add any specialization layers that are necessary on top of the >>>> "spark-base" image. >>>> >>>> Does this factorization sound reasonable to others? >>>> Cheers, >>>> Erik >>>> >>>> >>>> On Wed, Nov 29, 2017 at 10:04 AM, Mridul Muralidharan <mri...@gmail.com >>>> > wrote: >>>> >>>>> We do support running on Apache Mesos via docker images - so this >>>>> would not be restricted to k8s. >>>>> But unlike mesos support, which has other modes of running, I believe >>>>> k8s support more heavily depends on availability of docker images. >>>>> >>>>> >>>>> Regards, >>>>> Mridul >>>>> >>>>> >>>>> On Wed, Nov 29, 2017 at 8:56 AM, Sean Owen <so...@cloudera.com> wrote: >>>>> > Would it be logical to provide Docker-based distributions of other >>>>> pieces of >>>>> > Spark? or is this specific to K8S? >>>>> > The problem is we wouldn't generally also provide a distribution of >>>>> Spark >>>>> > for the reasons you give, because if that, then why not RPMs and so >>>>> on. >>>>> > >>>>> > On Wed, Nov 29, 2017 at 10:41 AM Anirudh Ramanathan < >>>>> ramanath...@google.com> >>>>> > wrote: >>>>> >> >>>>> >> In this context, I think the docker images are similar to the >>>>> binaries >>>>> >> rather than an extension. >>>>> >> It's packaging the compiled distribution to save people the effort >>>>> of >>>>> >> building one themselves, akin to binaries or the python package. >>>>> >> >>>>> >> For reference, this is the base dockerfile for the main image that >>>>> we >>>>> >> intend to publish. It's not particularly complicated. >>>>> >> The driver and executor images are based on said base image and only >>>>> >> customize the CMD (any file/directory inclusions are extraneous and >>>>> will be >>>>> >> removed). >>>>> >> >>>>> >> Is there only one way to build it? That's a bit harder to reason >>>>> about. >>>>> >> The base image I'd argue is likely going to always be built that >>>>> way. The >>>>> >> driver and executor images, there may be cases where people want to >>>>> >> customize it - (like putting all dependencies into it for example). >>>>> >> In those cases, as long as our images are bare bones, they can use >>>>> the >>>>> >> spark-driver/spark-executor images we publish as the base, and >>>>> build their >>>>> >> customization as a layer on top of it. >>>>> >> >>>>> >> I think the composability of docker images, makes this a bit >>>>> different >>>>> >> from say - debian packages. >>>>> >> We can publish canonical images that serve as both - a complete >>>>> image for >>>>> >> most Spark applications, as well as a stable substrate to build >>>>> >> customization upon. >>>>> >> >>>>> >> On Wed, Nov 29, 2017 at 7:38 AM, Mark Hamstra < >>>>> m...@clearstorydata.com> >>>>> >> wrote: >>>>> >>> >>>>> >>> It's probably also worth considering whether there is only one, >>>>> >>> well-defined, correct way to create such an image or whether this >>>>> is a >>>>> >>> reasonable avenue for customization. Part of why we don't do >>>>> something like >>>>> >>> maintain and publish canonical Debian packages for Spark is because >>>>> >>> different organizations doing packaging and distribution of >>>>> infrastructures >>>>> >>> or operating systems can reasonably want to do this in a custom (or >>>>> >>> non-customary) way. If there is really only one reasonable way to >>>>> do a >>>>> >>> docker image, then my bias starts to tend more toward the Spark >>>>> PMC taking >>>>> >>> on the responsibility to maintain and publish that image. If there >>>>> is more >>>>> >>> than one way to do it and publishing a particular image is more >>>>> just a >>>>> >>> convenience, then my bias tends more away from maintaining and >>>>> publish it. >>>>> >>> >>>>> >>> On Wed, Nov 29, 2017 at 5:14 AM, Sean Owen <so...@cloudera.com> >>>>> wrote: >>>>> >>>> >>>>> >>>> Source code is the primary release; compiled binary releases are >>>>> >>>> conveniences that are also released. A docker image sounds fairly >>>>> different >>>>> >>>> though. To the extent it's the standard delivery mechanism for >>>>> some artifact >>>>> >>>> (think: pyspark on PyPI as well) that makes sense, but is that the >>>>> >>>> situation? if it's more of an extension or alternate presentation >>>>> of Spark >>>>> >>>> components, that typically wouldn't be part of a Spark release. >>>>> The ones the >>>>> >>>> PMC takes responsibility for maintaining ought to be the core, >>>>> critical >>>>> >>>> means of distribution alone. >>>>> >>>> >>>>> >>>> On Wed, Nov 29, 2017 at 2:52 AM Anirudh Ramanathan >>>>> >>>> <ramanath...@google.com.invalid> wrote: >>>>> >>>>> >>>>> >>>>> Hi all, >>>>> >>>>> >>>>> >>>>> We're all working towards the Kubernetes scheduler backend (full >>>>> steam >>>>> >>>>> ahead!) that's targeted towards Spark 2.3. One of the questions >>>>> that comes >>>>> >>>>> up often is docker images. >>>>> >>>>> >>>>> >>>>> While we're making available dockerfiles to allow people to >>>>> create >>>>> >>>>> their own docker images from source, ideally, we'd want to >>>>> publish official >>>>> >>>>> docker images as part of the release process. >>>>> >>>>> >>>>> >>>>> I understand that the ASF has procedure around this, and we >>>>> would want >>>>> >>>>> to get that started to help us get these artifacts published by >>>>> 2.3. I'd >>>>> >>>>> love to get a discussion around this started, and the thoughts >>>>> of the >>>>> >>>>> community regarding this. >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Thanks, >>>>> >>>>> Anirudh Ramanathan >>>>> >>> >>>>> >>> >>>>> >> >>>>> >> >>>>> >> >>>>> >> -- >>>>> >> Anirudh Ramanathan >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>> >>>>> >>>> >>> >> >