Re: [DISCUSS] Unified Apache Spark Docker image tag?

2023-05-09 Thread Yikun Jiang
As I said in my last mail, I am very sorry if any mislead.

As you know, it is a little bit complicated to take into account K8s, base
image, standalone, Docker official image, etc., as well as various docker
image requirements such as java version and docker image tag.
Of course, it's not an excuse to escape.

I also think the most important thing right now is to let the community
know what's going on and where there is no consensus and reach a consensus.
For the latest tag behavior changes (Java version, tag rules, image size),
I also explained the original intention.
After collecting community feedback, it will also help us to take the next
step.

Let me add some more information to help us move forward:
- The docker image has been published with the *previous tag (v3.4.0)* and*
new tag (3.4.0/python/r/all in one)* for *v3.4.0*.
- The *latest** tag change is a user behavior change compared to the
previous tag*. it's pointed to the new tag python, in detail:
  * Point to python image rather than scala image, mainly considering
PySpark / Scala is more common.
  * The image size of the python image is 490+ MB, the scala image is 400+
MB
  * The default Java version is Java 11 compared to previous tag (v3.4.0)
Java 17.
- The docker image publish workflow updated to new workflow:
https://github.com/apache/spark-website/pull/458

>From my perspective, the next step is that:
- Decide default Java version of the latest tag
- Decide default image of the latest tag
- Decide should we update the publish workflow


On Wed, May 10, 2023 at 12:11 AM Dongjoon Hyun  wrote:

> May I ask why you think that sentence, "might need to deprecate ..." of
> SPIP, decided anything at that time?
>
> From my perspective,
> - `might need to` suggested only a possible necessity at some point in the
> future.
> - `deprecation` means no breaking change.
>
>
> Dongjoon
>
>
>
> On Tue, May 9, 2023 at 12:01 AM Yikun Jiang  wrote:
>
>> > It seems that your reply (the following) didn't reach out to the
>> mailing list correctly.
>>
>> Thanks! I'm not sure what happened before, thanks for your forward
>>
>> > Let me add my opinion. IIUC, the whole content of SPIP (Support Docker
>> Official Image for Spark) aims to add (1) newly, not to corrupt or destroy
>> the existing (2).
>>
>> - There were some description about how we should address the
>> apache/spark image after DOI support in doc:
>> "Considering that already had the apache/spark image, might need to
>> deprecate: spark/spark-py/spark-r `v3.3.0`, `v3.1.3`, `v3.2.1`, `v3.2.2`
>> tags, and *unified apache/spark image tags to docker official images
>> tags rule*, and also still keep apache/spark images and update
>> apache/spark images when released."
>> - I also post a mail
>> https://lists.apache.org/thread/zp550lt4f098zfpxgpc9bn360bwcfhs4 in Nov.
>> 2022, it's about Apache Spark official image, it's not for Docker official
>> image.
>>
>> So, it is not only for Docker official image (spark) but also for Apache
>> Spark official image (apache/spark).
>> Anyway, I am very sorry if there is any misleading, really many thanks
>> for your feedback and review.
>>
>> On Tue, May 9, 2023 at 12:37 PM Dongjoon Hyun 
>> wrote:
>>
>>> To Yikun,
>>>
>>> It seems that your reply (the following) didn't reach out to the mailing
>>> list correctly.
>>>
>>> > Just FYI, we also had a discussion about tag policy (latest/3.4.0) and
>>> also rough size estimation [1] in "SPIP: Support Docker Official Image for
>>> Spark".
>>> >
>>> https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o/edit?disco=f2TyFr0
>>>
>>> Let me add my opinion. IIUC, the whole content of SPIP (Support Docker
>>> Official Image for Spark) aims to add (1) newly, not to corrupt or destroy
>>> the existing (2).
>>>
>>> (1) https://hub.docker.com/_/spark
>>> (2) https://hub.docker.com/r/apache/spark/tags
>>>
>>> The reference model repos were also documented like the followings.
>>>
>>> https://hub.docker.com/_/flink
>>> https://hub.docker.com/_/storm
>>> https://hub.docker.com/_/solr
>>> https://hub.docker.com/_/zookeeper
>>>
>>> In short, according to the SPIP's `Docker Official Image` definition,
>>> new images should go to (1) only in order to achieve `Support Docker
>>> Official Image for Spark`, shouldn't they?
>>>
>>> Dongjoon.
>>>
>>> On Mon, May 8, 2023 at 6:22 PM Yikun Jiang  wrote:
>>>
 > 1. The size regression: `apache/spark:3.4.0` tag which is claimed to
 be a replacement of the existing `apache/spark:v3.4.0`. However, 3.4.0 is
 500MB while the original v3.4.0 is 405MB. 25% is huge in terms of the size.

 > 2. Accidental overwrite: `apache/spark:latest` was accidentally
 overwritten by `apache/spark:python3` image which has a bigger size due to
 the additional python binary. This is a breaking change to enforce the
 downstream users to change to something like `apache/spark:scala`.

 Just FYI, we also had a discussion about tag policy (latest/3.4.0) and

Re: [DISCUSS] Unified Apache Spark Docker image tag?

2023-05-09 Thread Mich Talebzadeh
Hi,

This has already been discussed a few times notably in August 2022 under
the topic  Time to start publishing Spark Docker Images?

Having said that, building a docker image is a trivial job not taking a few
minutes. Beside most cloud vendors they have their own specific tags. For
example see below

[image: image.png]

In general from a  practical point of view the docker image should be able
to interact with the backend databases etc that may not support java 17 etc.


To me building an official docker image is essentially an academic exercise.

HTH

Mich Talebzadeh,
Lead Solutions Architect/Engineering Lead
Palantir Technologies Limited
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 8 May 2023 at 09:58, Yikun Jiang  wrote:

> This is a call for discussion for how we can unified Apache Spark Docker
> image tag fluently.
>
> As you might know, there is an apache/spark-docker
>  repo to store the dockerfiles
> and help to publish the docker images, also intended to replace the
> original manually publish workflow.
>
> The scope of new images is to cover previous image cases (K8s / docker
> run) and also cover base image, standalone, Docker Official Image.
>
> - (Previous) apache/spark:v3.4.0, apache/spark-py:v3.4.0,
> apache/spark-r:v3.4.0
>
> * The image build from apache/spark spark on k8s dockerfiles
> 
>
> * Java version: Java 17 (It was Java 11 before v3.4.0, such as
> v3.3.0/v3.3.1/v3.3.2), set Java 17 by default in SPARK-40941
> .
>
> * Support: K8s / docker run
>
> * See also: Time to start publishing Spark Docker Images
> 
>
> * Link: https://hub.docker.com/r/apache/spark-py,
> https://hub.docker.com/r/apache/spark-r,
> https://hub.docker.com/r/apache/spark
>
> - (New) apache/spark:3.4.0-python3(3.4.0/latest), apache/spark:3.4.0-r,
> apache/spark:3.4.0-scala, and also a all in one image:
> apache/spark:3.4.0-scala2.12-java11-python3-r-ubuntu
>
> * The image build from apache/spark-docker dockerfiles
> 
>
> * Java version: Java 11, Java17 is supported by SPARK-40513
>  (under review)
>
> * Support: K8s / docker run / base image / standalone / Docker
> Official Image
>
> * See detail in: Support Docker Official Image for Spark
> 
>
> * About dropping prefix `v`:
> https://github.com/docker-library/official-images/issues/14506
>
> * Link: https://hub.docker.com/r/apache/spark
>
> We had some initial discuss on spark-website#458
> ,
> the mainly discussion is around version tag and default Java version
> behavior changes, so we’d like to hear your idea in here about below
> questions:
>
> *#1.Which Java version should be used by default (latest tag)? Java8 or
> Java 11 or Java 17 or Any*
>
> *#2.Which tag should be used in apache/spark? v3.4.0 (with prefix v) or
> 3.4.0 (dropping prefix v) or Both or Any*
>
> Starts with my prefer:
>
> 1. Java8 or Java17 are also ok to me (mainly considering the Java
> maintenance cycle). BTW, other apache projects: flink (8/11, 11 as default
> ),
> solr (11 as default
> 
> for 8.x, 17 as default
> 
> since solr9), zookeeper (11 as default
> 
> )
>
> 2. Only 3.4.0 (dropping prefix v). It will help us transition to the new
> tags with less confusion and also consider DOI suggestions
> .
>
> Please feel free to share your ideas.
>
>


Re: [DISCUSS] Unified Apache Spark Docker image tag?

2023-05-09 Thread Yikun Jiang
> 1. The size regression: `apache/spark:3.4.0` tag which is claimed to be a
replacement of the existing `apache/spark:v3.4.0`. However, 3.4.0 is 500MB
while the original v3.4.0 is 405MB. 25% is huge in terms of the size.

> 2. Accidental overwrite: `apache/spark:latest` was accidentally
overwritten by `apache/spark:python3` image which has a bigger size due to
the additional python binary. This is a breaking change to enforce the
downstream users to change to something like `apache/spark:scala`.

Just FYI, we also had a discussion about tag policy (latest/3.4.0) and
also rough size estimation [1] in "SPIP: Support Docker Official Image for
Spark".

[1]
https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o/edit?disco=f2TyFr0

Regards,
Yikun


On Tue, May 9, 2023 at 5:03 AM Dongjoon Hyun  wrote:

> Thank you for initiating the discussion in the community. Yes, we need to
> give more context in the dev mailing list.
>
> This root cause is not about SPARK-40941 or SPARK-40513. Technically, this
> situation started 16 days ago due to SPARK-43148 because it made some
> breaking changes.
>
> https://github.com/apache/spark-docker/pull/33
> SPARK-43148 Add Apache Spark 3.4.0 Dockerfiles
>
> 1. The size regression: `apache/spark:3.4.0` tag which is claimed to be a
> replacement of the existing `apache/spark:v3.4.0`. However, 3.4.0 is 500MB
> while the original v3.4.0 is 405MB. 25% is huge in terms of the size.
>
> 2. Accidental overwrite: `apache/spark:latest` was accidentally
> overwritten by `apache/spark:python3` image which has a bigger size due to
> the additional python binary. This is a breaking change to enforce the
> downstream users to change to something like `apache/spark:scala`.
>
> I believe (1) and (2) were our mistakes. We had better recover them ASAP.
> For Java questions, I prefer to be consistent with Apache Spark repo's
> default.
>
> Dongjoon.
>
> On 2023/05/08 08:56:26 Yikun Jiang wrote:
> > This is a call for discussion for how we can unified Apache Spark Docker
> > image tag fluently.
> >
> > As you might know, there is an apache/spark-docker
> >  repo to store the dockerfiles
> and
> > help to publish the docker images, also intended to replace the original
> > manually publish workflow.
> >
> > The scope of new images is to cover previous image cases (K8s / docker
> run)
> > and also cover base image, standalone, Docker Official Image.
> >
> > - (Previous) apache/spark:v3.4.0, apache/spark-py:v3.4.0,
> > apache/spark-r:v3.4.0
> >
> > * The image build from apache/spark spark on k8s dockerfiles
> > <
> https://github.com/apache/spark/tree/branch-3.4/resource-managers/kubernetes/docker/src/main/dockerfiles/spark
> >
> >
> > * Java version: Java 17 (It was Java 11 before v3.4.0, such as
> > v3.3.0/v3.3.1/v3.3.2), set Java 17 by default in SPARK-40941
> > .
> >
> > * Support: K8s / docker run
> >
> > * See also: Time to start publishing Spark Docker Images
> > 
> >
> > * Link: https://hub.docker.com/r/apache/spark-py,
> > https://hub.docker.com/r/apache/spark-r,
> > https://hub.docker.com/r/apache/spark
> >
> > - (New) apache/spark:3.4.0-python3(3.4.0/latest), apache/spark:3.4.0-r,
> > apache/spark:3.4.0-scala, and also a all in one image:
> > apache/spark:3.4.0-scala2.12-java11-python3-r-ubuntu
> >
> > * The image build from apache/spark-docker dockerfiles
> > 
> >
> > * Java version: Java 11, Java17 is supported by SPARK-40513
> >  (under review)
> >
> > * Support: K8s / docker run / base image / standalone / Docker
> Official
> > Image
> >
> > * See detail in: Support Docker Official Image for Spark
> > 
> >
> > * About dropping prefix `v`:
> > https://github.com/docker-library/official-images/issues/14506
> >
> > * Link: https://hub.docker.com/r/apache/spark
> >
> > We had some initial discuss on spark-website#458
> > <
> https://github.com/apache/spark-website/pull/458#issuecomment-1522426236>,
> > the mainly discussion is around version tag and default Java version
> > behavior changes, so we’d like to hear your idea in here about below
> > questions:
> >
> > *#1.Which Java version should be used by default (latest tag)? Java8 or
> > Java 11 or Java 17 or Any*
> >
> > *#2.Which tag should be used in apache/spark? v3.4.0 (with prefix v) or
> > 3.4.0 (dropping prefix v) or Both or Any*
> >
> > Starts with my prefer:
> >
> > 1. Java8 or Java17 are also ok to me (mainly considering the Java
> > maintenance cycle). BTW, other apache projects: flink (8/11, 11 as
> default
> > <
> https://github.com/docker-library/official-images/blob/93270eb07fb448fe7756b28af5495428242dcd6b/library/flink#L10
> >),
> > solr (11 as default
> 

Re: [DISCUSS] Unified Apache Spark Docker image tag?

2023-05-09 Thread Dongjoon Hyun
The whole content of SPIP (Support Docker Official Image for Spark) aims to
add (1) newly, not to corrupt or destroy the existing (2).

(1) https://hub.docker.com/_/spark
(2) https://hub.docker.com/r/apache/spark/tags

The reference model repos were also documented like the followings.

https://hub.docker.com/_/flink
https://hub.docker.com/_/storm
https://hub.docker.com/_/solr
https://hub.docker.com/_/zookeeper

In short, according to the SPIP's `Docker Official Image` definition, new
images should go to (1) only in order to achieve `Support Docker Official
Image for Spark`, shouldn't they?

Dongjoon.


On Mon, May 8, 2023 at 6:22 PM Yikun Jiang  wrote:

> > 1. The size regression: `apache/spark:3.4.0` tag which is claimed to be
> a replacement of the existing `apache/spark:v3.4.0`. However, 3.4.0 is
> 500MB while the original v3.4.0 is 405MB. 25% is huge in terms of the size.
>
> > 2. Accidental overwrite: `apache/spark:latest` was accidentally
> overwritten by `apache/spark:python3` image which has a bigger size due to
> the additional python binary. This is a breaking change to enforce the
> downstream users to change to something like `apache/spark:scala`.
>
> Just FYI, we also had a discussion about tag policy (latest/3.4.0) and
> also rough size estimation [1] in "SPIP: Support Docker Official Image for
> Spark".
>
> [1]
> https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o/edit?disco=f2TyFr0
>
> Regards,
> Yikun
>
>
> On Tue, May 9, 2023 at 5:03 AM Dongjoon Hyun  wrote:
>
>> Thank you for initiating the discussion in the community. Yes, we need to
>> give more context in the dev mailing list.
>>
>> This root cause is not about SPARK-40941 or SPARK-40513. Technically,
>> this situation started 16 days ago due to SPARK-43148 because it made some
>> breaking changes.
>>
>> https://github.com/apache/spark-docker/pull/33
>> SPARK-43148 Add Apache Spark 3.4.0 Dockerfiles
>>
>> 1. The size regression: `apache/spark:3.4.0` tag which is claimed to be a
>> replacement of the existing `apache/spark:v3.4.0`. However, 3.4.0 is 500MB
>> while the original v3.4.0 is 405MB. 25% is huge in terms of the size.
>>
>> 2. Accidental overwrite: `apache/spark:latest` was accidentally
>> overwritten by `apache/spark:python3` image which has a bigger size due to
>> the additional python binary. This is a breaking change to enforce the
>> downstream users to change to something like `apache/spark:scala`.
>>
>> I believe (1) and (2) were our mistakes. We had better recover them ASAP.
>> For Java questions, I prefer to be consistent with Apache Spark repo's
>> default.
>>
>> Dongjoon.
>>
>> On 2023/05/08 08:56:26 Yikun Jiang wrote:
>> > This is a call for discussion for how we can unified Apache Spark Docker
>> > image tag fluently.
>> >
>> > As you might know, there is an apache/spark-docker
>> >  repo to store the dockerfiles
>> and
>> > help to publish the docker images, also intended to replace the original
>> > manually publish workflow.
>> >
>> > The scope of new images is to cover previous image cases (K8s / docker
>> run)
>> > and also cover base image, standalone, Docker Official Image.
>> >
>> > - (Previous) apache/spark:v3.4.0, apache/spark-py:v3.4.0,
>> > apache/spark-r:v3.4.0
>> >
>> > * The image build from apache/spark spark on k8s dockerfiles
>> > <
>> https://github.com/apache/spark/tree/branch-3.4/resource-managers/kubernetes/docker/src/main/dockerfiles/spark
>> >
>> >
>> > * Java version: Java 17 (It was Java 11 before v3.4.0, such as
>> > v3.3.0/v3.3.1/v3.3.2), set Java 17 by default in SPARK-40941
>> > .
>> >
>> > * Support: K8s / docker run
>> >
>> > * See also: Time to start publishing Spark Docker Images
>> > 
>> >
>> > * Link: https://hub.docker.com/r/apache/spark-py,
>> > https://hub.docker.com/r/apache/spark-r,
>> > https://hub.docker.com/r/apache/spark
>> >
>> > - (New) apache/spark:3.4.0-python3(3.4.0/latest), apache/spark:3.4.0-r,
>> > apache/spark:3.4.0-scala, and also a all in one image:
>> > apache/spark:3.4.0-scala2.12-java11-python3-r-ubuntu
>> >
>> > * The image build from apache/spark-docker dockerfiles
>> > 
>> >
>> > * Java version: Java 11, Java17 is supported by SPARK-40513
>> >  (under review)
>> >
>> > * Support: K8s / docker run / base image / standalone / Docker
>> Official
>> > Image
>> >
>> > * See detail in: Support Docker Official Image for Spark
>> > 
>> >
>> > * About dropping prefix `v`:
>> > https://github.com/docker-library/official-images/issues/14506
>> >
>> > * Link: https://hub.docker.com/r/apache/spark
>> >
>> > We had some initial discuss on spark-website#458
>> > <
>> 

Re: [DISCUSS] Unified Apache Spark Docker image tag?

2023-05-09 Thread Dongjoon Hyun
May I ask why you think that sentence, "might need to deprecate ..." of
SPIP, decided anything at that time?

>From my perspective,
- `might need to` suggested only a possible necessity at some point in the
future.
- `deprecation` means no breaking change.


Dongjoon



On Tue, May 9, 2023 at 12:01 AM Yikun Jiang  wrote:

> > It seems that your reply (the following) didn't reach out to the mailing
> list correctly.
>
> Thanks! I'm not sure what happened before, thanks for your forward
>
> > Let me add my opinion. IIUC, the whole content of SPIP (Support Docker
> Official Image for Spark) aims to add (1) newly, not to corrupt or destroy
> the existing (2).
>
> - There were some description about how we should address the apache/spark
> image after DOI support in doc:
> "Considering that already had the apache/spark image, might need to
> deprecate: spark/spark-py/spark-r `v3.3.0`, `v3.1.3`, `v3.2.1`, `v3.2.2`
> tags, and *unified apache/spark image tags to docker official images tags
> rule*, and also still keep apache/spark images and update apache/spark
> images when released."
> - I also post a mail
> https://lists.apache.org/thread/zp550lt4f098zfpxgpc9bn360bwcfhs4 in Nov.
> 2022, it's about Apache Spark official image, it's not for Docker official
> image.
>
> So, it is not only for Docker official image (spark) but also for Apache
> Spark official image (apache/spark).
> Anyway, I am very sorry if there is any misleading, really many thanks for
> your feedback and review.
>
> On Tue, May 9, 2023 at 12:37 PM Dongjoon Hyun  wrote:
>
>> To Yikun,
>>
>> It seems that your reply (the following) didn't reach out to the mailing
>> list correctly.
>>
>> > Just FYI, we also had a discussion about tag policy (latest/3.4.0) and
>> also rough size estimation [1] in "SPIP: Support Docker Official Image for
>> Spark".
>> >
>> https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o/edit?disco=f2TyFr0
>>
>> Let me add my opinion. IIUC, the whole content of SPIP (Support Docker
>> Official Image for Spark) aims to add (1) newly, not to corrupt or destroy
>> the existing (2).
>>
>> (1) https://hub.docker.com/_/spark
>> (2) https://hub.docker.com/r/apache/spark/tags
>>
>> The reference model repos were also documented like the followings.
>>
>> https://hub.docker.com/_/flink
>> https://hub.docker.com/_/storm
>> https://hub.docker.com/_/solr
>> https://hub.docker.com/_/zookeeper
>>
>> In short, according to the SPIP's `Docker Official Image` definition, new
>> images should go to (1) only in order to achieve `Support Docker Official
>> Image for Spark`, shouldn't they?
>>
>> Dongjoon.
>>
>> On Mon, May 8, 2023 at 6:22 PM Yikun Jiang  wrote:
>>
>>> > 1. The size regression: `apache/spark:3.4.0` tag which is claimed to
>>> be a replacement of the existing `apache/spark:v3.4.0`. However, 3.4.0 is
>>> 500MB while the original v3.4.0 is 405MB. 25% is huge in terms of the size.
>>>
>>> > 2. Accidental overwrite: `apache/spark:latest` was accidentally
>>> overwritten by `apache/spark:python3` image which has a bigger size due to
>>> the additional python binary. This is a breaking change to enforce the
>>> downstream users to change to something like `apache/spark:scala`.
>>>
>>> Just FYI, we also had a discussion about tag policy (latest/3.4.0) and
>>> also rough size estimation [1] in "SPIP: Support Docker Official Image for
>>> Spark".
>>>
>>> [1]
>>> https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o/edit?disco=f2TyFr0
>>>
>>> Regards,
>>> Yikun
>>>
>>>
>>> On Tue, May 9, 2023 at 5:03 AM Dongjoon Hyun 
>>> wrote:
>>>
 Thank you for initiating the discussion in the community. Yes, we need
 to give more context in the dev mailing list.

 This root cause is not about SPARK-40941 or SPARK-40513. Technically,
 this situation started 16 days ago due to SPARK-43148 because it made some
 breaking changes.

 https://github.com/apache/spark-docker/pull/33
 SPARK-43148 Add Apache Spark 3.4.0 Dockerfiles

 1. The size regression: `apache/spark:3.4.0` tag which is claimed to be
 a replacement of the existing `apache/spark:v3.4.0`. However, 3.4.0 is
 500MB while the original v3.4.0 is 405MB. 25% is huge in terms of the size.

 2. Accidental overwrite: `apache/spark:latest` was accidentally
 overwritten by `apache/spark:python3` image which has a bigger size due to
 the additional python binary. This is a breaking change to enforce the
 downstream users to change to something like `apache/spark:scala`.

 I believe (1) and (2) were our mistakes. We had better recover them
 ASAP.
 For Java questions, I prefer to be consistent with Apache Spark repo's
 default.

 Dongjoon.

 On 2023/05/08 08:56:26 Yikun Jiang wrote:
 > This is a call for discussion for how we can unified Apache Spark
 Docker
 > image tag fluently.
 >
 > As you might know, there is an 

Re: [DISCUSS] Unified Apache Spark Docker image tag?

2023-05-09 Thread Yikun Jiang
> It seems that your reply (the following) didn't reach out to the mailing
list correctly.

Thanks! I'm not sure what happened before, thanks for your forward

> Let me add my opinion. IIUC, the whole content of SPIP (Support Docker
Official Image for Spark) aims to add (1) newly, not to corrupt or destroy
the existing (2).

- There were some description about how we should address the apache/spark
image after DOI support in doc:
"Considering that already had the apache/spark image, might need to
deprecate: spark/spark-py/spark-r `v3.3.0`, `v3.1.3`, `v3.2.1`, `v3.2.2`
tags, and *unified apache/spark image tags to docker official images tags
rule*, and also still keep apache/spark images and update apache/spark
images when released."
- I also post a mail
https://lists.apache.org/thread/zp550lt4f098zfpxgpc9bn360bwcfhs4 in Nov.
2022, it's about Apache Spark official image, it's not for Docker official
image.

So, it is not only for Docker official image (spark) but also for Apache
Spark official image (apache/spark).
Anyway, I am very sorry if there is any misleading, really many thanks for
your feedback and review.

On Tue, May 9, 2023 at 12:37 PM Dongjoon Hyun  wrote:

> To Yikun,
>
> It seems that your reply (the following) didn't reach out to the mailing
> list correctly.
>
> > Just FYI, we also had a discussion about tag policy (latest/3.4.0) and
> also rough size estimation [1] in "SPIP: Support Docker Official Image for
> Spark".
> >
> https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o/edit?disco=f2TyFr0
>
> Let me add my opinion. IIUC, the whole content of SPIP (Support Docker
> Official Image for Spark) aims to add (1) newly, not to corrupt or destroy
> the existing (2).
>
> (1) https://hub.docker.com/_/spark
> (2) https://hub.docker.com/r/apache/spark/tags
>
> The reference model repos were also documented like the followings.
>
> https://hub.docker.com/_/flink
> https://hub.docker.com/_/storm
> https://hub.docker.com/_/solr
> https://hub.docker.com/_/zookeeper
>
> In short, according to the SPIP's `Docker Official Image` definition, new
> images should go to (1) only in order to achieve `Support Docker Official
> Image for Spark`, shouldn't they?
>
> Dongjoon.
>
> On Mon, May 8, 2023 at 6:22 PM Yikun Jiang  wrote:
>
>> > 1. The size regression: `apache/spark:3.4.0` tag which is claimed to be
>> a replacement of the existing `apache/spark:v3.4.0`. However, 3.4.0 is
>> 500MB while the original v3.4.0 is 405MB. 25% is huge in terms of the size.
>>
>> > 2. Accidental overwrite: `apache/spark:latest` was accidentally
>> overwritten by `apache/spark:python3` image which has a bigger size due to
>> the additional python binary. This is a breaking change to enforce the
>> downstream users to change to something like `apache/spark:scala`.
>>
>> Just FYI, we also had a discussion about tag policy (latest/3.4.0) and
>> also rough size estimation [1] in "SPIP: Support Docker Official Image for
>> Spark".
>>
>> [1]
>> https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o/edit?disco=f2TyFr0
>>
>> Regards,
>> Yikun
>>
>>
>> On Tue, May 9, 2023 at 5:03 AM Dongjoon Hyun  wrote:
>>
>>> Thank you for initiating the discussion in the community. Yes, we need
>>> to give more context in the dev mailing list.
>>>
>>> This root cause is not about SPARK-40941 or SPARK-40513. Technically,
>>> this situation started 16 days ago due to SPARK-43148 because it made some
>>> breaking changes.
>>>
>>> https://github.com/apache/spark-docker/pull/33
>>> SPARK-43148 Add Apache Spark 3.4.0 Dockerfiles
>>>
>>> 1. The size regression: `apache/spark:3.4.0` tag which is claimed to be
>>> a replacement of the existing `apache/spark:v3.4.0`. However, 3.4.0 is
>>> 500MB while the original v3.4.0 is 405MB. 25% is huge in terms of the size.
>>>
>>> 2. Accidental overwrite: `apache/spark:latest` was accidentally
>>> overwritten by `apache/spark:python3` image which has a bigger size due to
>>> the additional python binary. This is a breaking change to enforce the
>>> downstream users to change to something like `apache/spark:scala`.
>>>
>>> I believe (1) and (2) were our mistakes. We had better recover them ASAP.
>>> For Java questions, I prefer to be consistent with Apache Spark repo's
>>> default.
>>>
>>> Dongjoon.
>>>
>>> On 2023/05/08 08:56:26 Yikun Jiang wrote:
>>> > This is a call for discussion for how we can unified Apache Spark
>>> Docker
>>> > image tag fluently.
>>> >
>>> > As you might know, there is an apache/spark-docker
>>> >  repo to store the
>>> dockerfiles and
>>> > help to publish the docker images, also intended to replace the
>>> original
>>> > manually publish workflow.
>>> >
>>> > The scope of new images is to cover previous image cases (K8s / docker
>>> run)
>>> > and also cover base image, standalone, Docker Official Image.
>>> >
>>> > - (Previous) apache/spark:v3.4.0, apache/spark-py:v3.4.0,
>>> > 

Re: [DISCUSS] Unified Apache Spark Docker image tag?

2023-05-08 Thread Dongjoon Hyun
To Yikun,

It seems that your reply (the following) didn't reach out to the mailing
list correctly.

> Just FYI, we also had a discussion about tag policy (latest/3.4.0) and
also rough size estimation [1] in "SPIP: Support Docker Official Image for
Spark".
>
https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o/edit?disco=f2TyFr0

Let me add my opinion. IIUC, the whole content of SPIP (Support Docker
Official Image for Spark) aims to add (1) newly, not to corrupt or destroy
the existing (2).

(1) https://hub.docker.com/_/spark
(2) https://hub.docker.com/r/apache/spark/tags

The reference model repos were also documented like the followings.

https://hub.docker.com/_/flink
https://hub.docker.com/_/storm
https://hub.docker.com/_/solr
https://hub.docker.com/_/zookeeper

In short, according to the SPIP's `Docker Official Image` definition, new
images should go to (1) only in order to achieve `Support Docker Official
Image for Spark`, shouldn't they?

Dongjoon.

On Mon, May 8, 2023 at 6:22 PM Yikun Jiang  wrote:

> > 1. The size regression: `apache/spark:3.4.0` tag which is claimed to be
> a replacement of the existing `apache/spark:v3.4.0`. However, 3.4.0 is
> 500MB while the original v3.4.0 is 405MB. 25% is huge in terms of the size.
>
> > 2. Accidental overwrite: `apache/spark:latest` was accidentally
> overwritten by `apache/spark:python3` image which has a bigger size due to
> the additional python binary. This is a breaking change to enforce the
> downstream users to change to something like `apache/spark:scala`.
>
> Just FYI, we also had a discussion about tag policy (latest/3.4.0) and
> also rough size estimation [1] in "SPIP: Support Docker Official Image for
> Spark".
>
> [1]
> https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o/edit?disco=f2TyFr0
>
> Regards,
> Yikun
>
>
> On Tue, May 9, 2023 at 5:03 AM Dongjoon Hyun  wrote:
>
>> Thank you for initiating the discussion in the community. Yes, we need to
>> give more context in the dev mailing list.
>>
>> This root cause is not about SPARK-40941 or SPARK-40513. Technically,
>> this situation started 16 days ago due to SPARK-43148 because it made some
>> breaking changes.
>>
>> https://github.com/apache/spark-docker/pull/33
>> SPARK-43148 Add Apache Spark 3.4.0 Dockerfiles
>>
>> 1. The size regression: `apache/spark:3.4.0` tag which is claimed to be a
>> replacement of the existing `apache/spark:v3.4.0`. However, 3.4.0 is 500MB
>> while the original v3.4.0 is 405MB. 25% is huge in terms of the size.
>>
>> 2. Accidental overwrite: `apache/spark:latest` was accidentally
>> overwritten by `apache/spark:python3` image which has a bigger size due to
>> the additional python binary. This is a breaking change to enforce the
>> downstream users to change to something like `apache/spark:scala`.
>>
>> I believe (1) and (2) were our mistakes. We had better recover them ASAP.
>> For Java questions, I prefer to be consistent with Apache Spark repo's
>> default.
>>
>> Dongjoon.
>>
>> On 2023/05/08 08:56:26 Yikun Jiang wrote:
>> > This is a call for discussion for how we can unified Apache Spark Docker
>> > image tag fluently.
>> >
>> > As you might know, there is an apache/spark-docker
>> >  repo to store the dockerfiles
>> and
>> > help to publish the docker images, also intended to replace the original
>> > manually publish workflow.
>> >
>> > The scope of new images is to cover previous image cases (K8s / docker
>> run)
>> > and also cover base image, standalone, Docker Official Image.
>> >
>> > - (Previous) apache/spark:v3.4.0, apache/spark-py:v3.4.0,
>> > apache/spark-r:v3.4.0
>> >
>> > * The image build from apache/spark spark on k8s dockerfiles
>> > <
>> https://github.com/apache/spark/tree/branch-3.4/resource-managers/kubernetes/docker/src/main/dockerfiles/spark
>> >
>> >
>> > * Java version: Java 17 (It was Java 11 before v3.4.0, such as
>> > v3.3.0/v3.3.1/v3.3.2), set Java 17 by default in SPARK-40941
>> > .
>> >
>> > * Support: K8s / docker run
>> >
>> > * See also: Time to start publishing Spark Docker Images
>> > 
>> >
>> > * Link: https://hub.docker.com/r/apache/spark-py,
>> > https://hub.docker.com/r/apache/spark-r,
>> > https://hub.docker.com/r/apache/spark
>> >
>> > - (New) apache/spark:3.4.0-python3(3.4.0/latest), apache/spark:3.4.0-r,
>> > apache/spark:3.4.0-scala, and also a all in one image:
>> > apache/spark:3.4.0-scala2.12-java11-python3-r-ubuntu
>> >
>> > * The image build from apache/spark-docker dockerfiles
>> > 
>> >
>> > * Java version: Java 11, Java17 is supported by SPARK-40513
>> >  (under review)
>> >
>> > * Support: K8s / docker run / base image / standalone / Docker
>> Official
>> > Image
>> >

Re: [DISCUSS] Unified Apache Spark Docker image tag?

2023-05-08 Thread Dongjoon Hyun
Thank you for initiating the discussion in the community. Yes, we need to give 
more context in the dev mailing list.

This root cause is not about SPARK-40941 or SPARK-40513. Technically, this 
situation started 16 days ago due to SPARK-43148 because it made some breaking 
changes.

https://github.com/apache/spark-docker/pull/33
SPARK-43148 Add Apache Spark 3.4.0 Dockerfiles

1. The size regression: `apache/spark:3.4.0` tag which is claimed to be a 
replacement of the existing `apache/spark:v3.4.0`. However, 3.4.0 is 500MB 
while the original v3.4.0 is 405MB. 25% is huge in terms of the size.

2. Accidental overwrite: `apache/spark:latest` was accidentally overwritten by 
`apache/spark:python3` image which has a bigger size due to the additional 
python binary. This is a breaking change to enforce the downstream users to 
change to something like `apache/spark:scala`.

I believe (1) and (2) were our mistakes. We had better recover them ASAP.
For Java questions, I prefer to be consistent with Apache Spark repo's default.

Dongjoon.

On 2023/05/08 08:56:26 Yikun Jiang wrote:
> This is a call for discussion for how we can unified Apache Spark Docker
> image tag fluently.
> 
> As you might know, there is an apache/spark-docker
>  repo to store the dockerfiles and
> help to publish the docker images, also intended to replace the original
> manually publish workflow.
> 
> The scope of new images is to cover previous image cases (K8s / docker run)
> and also cover base image, standalone, Docker Official Image.
> 
> - (Previous) apache/spark:v3.4.0, apache/spark-py:v3.4.0,
> apache/spark-r:v3.4.0
> 
> * The image build from apache/spark spark on k8s dockerfiles
> 
> 
> * Java version: Java 17 (It was Java 11 before v3.4.0, such as
> v3.3.0/v3.3.1/v3.3.2), set Java 17 by default in SPARK-40941
> .
> 
> * Support: K8s / docker run
> 
> * See also: Time to start publishing Spark Docker Images
> 
> 
> * Link: https://hub.docker.com/r/apache/spark-py,
> https://hub.docker.com/r/apache/spark-r,
> https://hub.docker.com/r/apache/spark
> 
> - (New) apache/spark:3.4.0-python3(3.4.0/latest), apache/spark:3.4.0-r,
> apache/spark:3.4.0-scala, and also a all in one image:
> apache/spark:3.4.0-scala2.12-java11-python3-r-ubuntu
> 
> * The image build from apache/spark-docker dockerfiles
> 
> 
> * Java version: Java 11, Java17 is supported by SPARK-40513
>  (under review)
> 
> * Support: K8s / docker run / base image / standalone / Docker Official
> Image
> 
> * See detail in: Support Docker Official Image for Spark
> 
> 
> * About dropping prefix `v`:
> https://github.com/docker-library/official-images/issues/14506
> 
> * Link: https://hub.docker.com/r/apache/spark
> 
> We had some initial discuss on spark-website#458
> ,
> the mainly discussion is around version tag and default Java version
> behavior changes, so we’d like to hear your idea in here about below
> questions:
> 
> *#1.Which Java version should be used by default (latest tag)? Java8 or
> Java 11 or Java 17 or Any*
> 
> *#2.Which tag should be used in apache/spark? v3.4.0 (with prefix v) or
> 3.4.0 (dropping prefix v) or Both or Any*
> 
> Starts with my prefer:
> 
> 1. Java8 or Java17 are also ok to me (mainly considering the Java
> maintenance cycle). BTW, other apache projects: flink (8/11, 11 as default
> ),
> solr (11 as default
> 
> for 8.x, 17 as default
> 
> since solr9), zookeeper (11 as default
> 
> )
> 
> 2. Only 3.4.0 (dropping prefix v). It will help us transition to the new
> tags with less confusion and also consider DOI suggestions
> .
> 
> Please feel free to share your ideas.
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[DISCUSS] Unified Apache Spark Docker image tag?

2023-05-08 Thread Yikun Jiang
This is a call for discussion for how we can unified Apache Spark Docker
image tag fluently.

As you might know, there is an apache/spark-docker
 repo to store the dockerfiles and
help to publish the docker images, also intended to replace the original
manually publish workflow.

The scope of new images is to cover previous image cases (K8s / docker run)
and also cover base image, standalone, Docker Official Image.

- (Previous) apache/spark:v3.4.0, apache/spark-py:v3.4.0,
apache/spark-r:v3.4.0

* The image build from apache/spark spark on k8s dockerfiles


* Java version: Java 17 (It was Java 11 before v3.4.0, such as
v3.3.0/v3.3.1/v3.3.2), set Java 17 by default in SPARK-40941
.

* Support: K8s / docker run

* See also: Time to start publishing Spark Docker Images


* Link: https://hub.docker.com/r/apache/spark-py,
https://hub.docker.com/r/apache/spark-r,
https://hub.docker.com/r/apache/spark

- (New) apache/spark:3.4.0-python3(3.4.0/latest), apache/spark:3.4.0-r,
apache/spark:3.4.0-scala, and also a all in one image:
apache/spark:3.4.0-scala2.12-java11-python3-r-ubuntu

* The image build from apache/spark-docker dockerfiles


* Java version: Java 11, Java17 is supported by SPARK-40513
 (under review)

* Support: K8s / docker run / base image / standalone / Docker Official
Image

* See detail in: Support Docker Official Image for Spark


* About dropping prefix `v`:
https://github.com/docker-library/official-images/issues/14506

* Link: https://hub.docker.com/r/apache/spark

We had some initial discuss on spark-website#458
,
the mainly discussion is around version tag and default Java version
behavior changes, so we’d like to hear your idea in here about below
questions:

*#1.Which Java version should be used by default (latest tag)? Java8 or
Java 11 or Java 17 or Any*

*#2.Which tag should be used in apache/spark? v3.4.0 (with prefix v) or
3.4.0 (dropping prefix v) or Both or Any*

Starts with my prefer:

1. Java8 or Java17 are also ok to me (mainly considering the Java
maintenance cycle). BTW, other apache projects: flink (8/11, 11 as default
),
solr (11 as default

for 8.x, 17 as default

since solr9), zookeeper (11 as default

)

2. Only 3.4.0 (dropping prefix v). It will help us transition to the new
tags with less confusion and also consider DOI suggestions
.

Please feel free to share your ideas.