Since Spark 3.1.1 is out now I was wondering if it would make sense to
try to get some consensus about starting to release docker images as
part of Spark 3.2.
Having ready to use images would definitely benefit adoption in
particular now that we support containerized runs via k8s became GA.

WDYT? Are there still some issues/blockers or reasons to not move forward?

On Tue, Feb 18, 2020 at 2:29 PM Ismaël Mejía <ieme...@gmail.com> wrote:
>
> +1 to have Spark docker images for Dongjoon's arguments, having a container
> based distribution is definitely something in the benefit of users and the
> project too. Having this in the Apache Spark repo matters because of multiple
> eyes to fix/ímprove the images for the benefit of everyone.
>
> What still needs to be tested is the best distribution approach. I have been
> involved in both Flink and Beam's docker images processes (and passed the 
> whole
> 'docker official image' validation and some of the learnt lessons is that the
> less you put in an image the best it is for everyone. So I wonder if the whole
> include everything in the world (Python, R, etc) would scale or if those 
> should
> be overlays on top of a more core minimal image,  but well those are details 
> to
> fix once consensus on this is agreed.
>
> On the Apache INFRA side there is some stuff to deal with at the beginning, 
> but
> things become smoother once they are in place.  In any case fantastic idea and
> if I can help around I would be glad to.
>
> Regards,
> Ismaël
>
> On Tue, Feb 11, 2020 at 10:56 PM Dongjoon Hyun <dongjoon.h...@gmail.com> 
> wrote:
>>
>> Hi, Sean.
>>
>> Yes. We should keep this minimal.
>>
>> BTW, for the following questions,
>>
>>     > But how much value does that add?
>>
>> How much value do you think we have at our binary distribution in the 
>> following link?
>>
>>     - 
>> https://www.apache.org/dist/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz
>>
>> Docker image can have a similar value with the above for the users who are 
>> using Dockerized environment.
>>
>> If you are assuming the users who build from the source code or lives on 
>> vendor distributions, both the above existing binary distribution link and 
>> Docker image have no value.
>>
>> Bests,
>> Dongjoon.
>>
>>
>> On Tue, Feb 11, 2020 at 8:51 AM Sean Owen <sro...@gmail.com> wrote:
>>>
>>> To be clear this is a convenience 'binary' for end users, not just an
>>> internal packaging to aid the testing framework?
>>>
>>> There's nothing wrong with providing an additional official packaging
>>> if we vote on it and it follows all the rules. There is an open
>>> question about how much value it adds vs that maintenance. I see we do
>>> already have some Dockerfiles, sure. Is it possible to reuse or
>>> repurpose these so that we don't have more to maintain? or: what is
>>> different from the existing Dockerfiles here? (dumb question, never
>>> paid much attention to them)
>>>
>>> We definitely can't release GPL bits or anything, yes. Just releasing
>>> a Dockerfile referring to GPL bits is a gray area - no bits are being
>>> redistributed, but, does it constitute a derived work where the GPL
>>> stuff is a non-optional dependency? Would any publishing of these
>>> images cause us to put a copy of third party GPL code anywhere?
>>>
>>> At the least, we should keep this minimal. One image if possible, that
>>> you overlay on top of your preferred OS/Java/Python image. But how
>>> much value does that add? I have no info either way that people want
>>> or don't need such a thing.
>>>
>>> On Tue, Feb 11, 2020 at 10:13 AM Erik Erlandson <eerla...@redhat.com> wrote:
>>> >
>>> > My takeaway from the last time we discussed this was:
>>> > 1) To be ASF compliant, we needed to only publish images at official 
>>> > releases
>>> > 2) There was some ambiguity about whether or not a container image that 
>>> > included GPL'ed packages (spark images do) might trip over the GPL "viral 
>>> > propagation" due to integrating ASL and GPL in a "binary release".  The 
>>> > "air gap" GPL provision may apply - the GPL software interacts only at 
>>> > command-line boundaries.
>>> >
>>> > On Wed, Feb 5, 2020 at 1:23 PM Dongjoon Hyun <dongjoon.h...@gmail.com> 
>>> > wrote:
>>> >>
>>> >> Hi, All.
>>> >>
>>> >> From 2020, shall we have an official Docker image repository as an 
>>> >> additional distribution channel?
>>> >>
>>> >> I'm considering the following images.
>>> >>
>>> >>     - Public binary release (no snapshot image)
>>> >>     - Public non-Spark base image (OS + R + Python)
>>> >>       (This can be used in GitHub Action Jobs and Jenkins K8s 
>>> >> Integration Tests to speed up jobs and to have more stabler environments)
>>> >>
>>> >> Bests,
>>> >> Dongjoon.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to