Unfortunately you'll need to chase down the license of all the bits that
are distributed directly by the project. This was a big job back in the day
for the Maven artifacts and some work to maintain. Most of the work is
one-time, at least.

On Tue, Dec 19, 2017 at 12:53 PM Erik Erlandson <eerla...@redhat.com> wrote:

> Agreed that the GPL family would be "toxic."
>
> The current images have been at least informally confirmed to use licenses
> that are ASF compatible.  Is there an officially sanctioned method of
> license auditing that can be applied here?
>
> On Tue, Dec 19, 2017 at 11:45 AM, Sean Owen <so...@cloudera.com> wrote:
>
>> I think that's all correct, though the license of third party
>> dependencies is actually a difficult and sticky part. The ASF couldn't make
>> a software release including any GPL software for example, and it's not
>> just a matter of adding a disclaimer. Any actual bits distributed by the
>> PMC would have to follow all the license rules.
>>
>> On Tue, Dec 19, 2017 at 12:34 PM Erik Erlandson <eerla...@redhat.com>
>> wrote:
>>
>>> I've been looking a bit more into ASF legal posture on licensing and
>>> container images. What I have found indicates that ASF considers container
>>> images to be just another variety of distribution channel.  As such, it is
>>> acceptable to publish official releases; for example an image such as
>>> spark:v2.3.0 built from the v2.3.0 source is fine.  It is not acceptable to
>>> do something like regularly publish spark:latest built from the head of
>>> master.
>>>
>>> More detail here:
>>> https://issues.apache.org/jira/browse/LEGAL-270
>>>
>>> So as I understand it, making a release-tagged public image as part of
>>> each official release does not pose any problems.
>>>
>>> With respect to considering the licenses of other ancillary dependencies
>>> that are also installed on such container images, I noticed this clause in
>>> the legal boilerplate for the Flink images
>>> <https://hub.docker.com/r/library/flink/>:
>>>
>>> As with all Docker images, these likely also contain other software
>>>> which may be under other licenses (such as Bash, etc from the base
>>>> distribution, along with any direct or indirect dependencies of the primary
>>>> software being contained).
>>>>
>>>
>>> So it may be sufficient to resolve this via disclaimer.
>>>
>>> -Erik
>>>
>>> On Thu, Dec 14, 2017 at 7:55 PM, Erik Erlandson <eerla...@redhat.com>
>>> wrote:
>>>
>>>> Currently the containers are based off alpine, which pulls in BSD2 and
>>>> MIT licensing:
>>>> https://github.com/apache/spark/pull/19717#discussion_r154502824
>>>>
>>>> to the best of my understanding, neither of those poses a problem.  If
>>>> we based the image off of centos I'd also expect the licensing of any image
>>>> deps to be compatible.
>>>>
>>>> On Thu, Dec 14, 2017 at 7:19 PM, Mark Hamstra <m...@clearstorydata.com>
>>>> wrote:
>>>>
>>>>> What licensing issues come into play?
>>>>>
>>>>> On Thu, Dec 14, 2017 at 4:00 PM, Erik Erlandson <eerla...@redhat.com>
>>>>> wrote:
>>>>>
>>>>>> We've been discussing the topic of container images a bit more.  The
>>>>>> kubernetes back-end operates by executing some specific CMD and 
>>>>>> ENTRYPOINT
>>>>>> logic, which is different than mesos, and which is probably not practical
>>>>>> to unify at this level.
>>>>>>
>>>>>> However: These CMD and ENTRYPOINT configurations are essentially just
>>>>>> a thin skin on top of an image which is just an install of a spark 
>>>>>> distro.
>>>>>> We feel that a single "spark-base" image should be publishable, that is
>>>>>> consumable by kube-spark images, and mesos-spark images, and likely any
>>>>>> other community image whose primary purpose is running spark components.
>>>>>> The kube-specific dockerfiles would be written "FROM spark-base" and just
>>>>>> add the small command and entrypoint layers.  Likewise, the mesos images
>>>>>> could add any specialization layers that are necessary on top of the
>>>>>> "spark-base" image.
>>>>>>
>>>>>> Does this factorization sound reasonable to others?
>>>>>> Cheers,
>>>>>> Erik
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 29, 2017 at 10:04 AM, Mridul Muralidharan <
>>>>>> mri...@gmail.com> wrote:
>>>>>>
>>>>>>> We do support running on Apache Mesos via docker images - so this
>>>>>>> would not be restricted to k8s.
>>>>>>> But unlike mesos support, which has other modes of running, I believe
>>>>>>> k8s support more heavily depends on availability of docker images.
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Mridul
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Nov 29, 2017 at 8:56 AM, Sean Owen <so...@cloudera.com>
>>>>>>> wrote:
>>>>>>> > Would it be logical to provide Docker-based distributions of other
>>>>>>> pieces of
>>>>>>> > Spark? or is this specific to K8S?
>>>>>>> > The problem is we wouldn't generally also provide a distribution
>>>>>>> of Spark
>>>>>>> > for the reasons you give, because if that, then why not RPMs and
>>>>>>> so on.
>>>>>>> >
>>>>>>> > On Wed, Nov 29, 2017 at 10:41 AM Anirudh Ramanathan <
>>>>>>> ramanath...@google.com>
>>>>>>> > wrote:
>>>>>>> >>
>>>>>>> >> In this context, I think the docker images are similar to the
>>>>>>> binaries
>>>>>>> >> rather than an extension.
>>>>>>> >> It's packaging the compiled distribution to save people the
>>>>>>> effort of
>>>>>>> >> building one themselves, akin to binaries or the python package.
>>>>>>> >>
>>>>>>> >> For reference, this is the base dockerfile for the main image
>>>>>>> that we
>>>>>>> >> intend to publish. It's not particularly complicated.
>>>>>>> >> The driver and executor images are based on said base image and
>>>>>>> only
>>>>>>> >> customize the CMD (any file/directory inclusions are extraneous
>>>>>>> and will be
>>>>>>> >> removed).
>>>>>>> >>
>>>>>>> >> Is there only one way to build it? That's a bit harder to reason
>>>>>>> about.
>>>>>>> >> The base image I'd argue is likely going to always be built that
>>>>>>> way. The
>>>>>>> >> driver and executor images, there may be cases where people want
>>>>>>> to
>>>>>>> >> customize it - (like putting all dependencies into it for
>>>>>>> example).
>>>>>>> >> In those cases, as long as our images are bare bones, they can
>>>>>>> use the
>>>>>>> >> spark-driver/spark-executor images we publish as the base, and
>>>>>>> build their
>>>>>>> >> customization as a layer on top of it.
>>>>>>> >>
>>>>>>> >> I think the composability of docker images, makes this a bit
>>>>>>> different
>>>>>>> >> from say - debian packages.
>>>>>>> >> We can publish canonical images that serve as both - a complete
>>>>>>> image for
>>>>>>> >> most Spark applications, as well as a stable substrate to build
>>>>>>> >> customization upon.
>>>>>>> >>
>>>>>>> >> On Wed, Nov 29, 2017 at 7:38 AM, Mark Hamstra <
>>>>>>> m...@clearstorydata.com>
>>>>>>> >> wrote:
>>>>>>> >>>
>>>>>>> >>> It's probably also worth considering whether there is only one,
>>>>>>> >>> well-defined, correct way to create such an image or whether
>>>>>>> this is a
>>>>>>> >>> reasonable avenue for customization. Part of why we don't do
>>>>>>> something like
>>>>>>> >>> maintain and publish canonical Debian packages for Spark is
>>>>>>> because
>>>>>>> >>> different organizations doing packaging and distribution of
>>>>>>> infrastructures
>>>>>>> >>> or operating systems can reasonably want to do this in a custom
>>>>>>> (or
>>>>>>> >>> non-customary) way. If there is really only one reasonable way
>>>>>>> to do a
>>>>>>> >>> docker image, then my bias starts to tend more toward the Spark
>>>>>>> PMC taking
>>>>>>> >>> on the responsibility to maintain and publish that image. If
>>>>>>> there is more
>>>>>>> >>> than one way to do it and publishing a particular image is more
>>>>>>> just a
>>>>>>> >>> convenience, then my bias tends more away from maintaining and
>>>>>>> publish it.
>>>>>>> >>>
>>>>>>> >>> On Wed, Nov 29, 2017 at 5:14 AM, Sean Owen <so...@cloudera.com>
>>>>>>> wrote:
>>>>>>> >>>>
>>>>>>> >>>> Source code is the primary release; compiled binary releases are
>>>>>>> >>>> conveniences that are also released. A docker image sounds
>>>>>>> fairly different
>>>>>>> >>>> though. To the extent it's the standard delivery mechanism for
>>>>>>> some artifact
>>>>>>> >>>> (think: pyspark on PyPI as well) that makes sense, but is that
>>>>>>> the
>>>>>>> >>>> situation? if it's more of an extension or alternate
>>>>>>> presentation of Spark
>>>>>>> >>>> components, that typically wouldn't be part of a Spark release.
>>>>>>> The ones the
>>>>>>> >>>> PMC takes responsibility for maintaining ought to be the core,
>>>>>>> critical
>>>>>>> >>>> means of distribution alone.
>>>>>>> >>>>
>>>>>>> >>>> On Wed, Nov 29, 2017 at 2:52 AM Anirudh Ramanathan
>>>>>>> >>>> <ramanath...@google.com.invalid> wrote:
>>>>>>> >>>>>
>>>>>>> >>>>> Hi all,
>>>>>>> >>>>>
>>>>>>> >>>>> We're all working towards the Kubernetes scheduler backend
>>>>>>> (full steam
>>>>>>> >>>>> ahead!) that's targeted towards Spark 2.3. One of the
>>>>>>> questions that comes
>>>>>>> >>>>> up often is docker images.
>>>>>>> >>>>>
>>>>>>> >>>>> While we're making available dockerfiles to allow people to
>>>>>>> create
>>>>>>> >>>>> their own docker images from source, ideally, we'd want to
>>>>>>> publish official
>>>>>>> >>>>> docker images as part of the release process.
>>>>>>> >>>>>
>>>>>>> >>>>> I understand that the ASF has procedure around this, and we
>>>>>>> would want
>>>>>>> >>>>> to get that started to help us get these artifacts published
>>>>>>> by 2.3. I'd
>>>>>>> >>>>> love to get a discussion around this started, and the thoughts
>>>>>>> of the
>>>>>>> >>>>> community regarding this.
>>>>>>> >>>>>
>>>>>>> >>>>> --
>>>>>>> >>>>> Thanks,
>>>>>>> >>>>> Anirudh Ramanathan
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> --
>>>>>>> >> Anirudh Ramanathan
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>

Reply via email to