Hi, all

Last month the vote of "Support Docker Official Image for Spark
<https://issues.apache.org/jira/browse/SPARK-40513>" passed.

# Progress of SPIP:

## Completed:
- A new github repo created: https://github.com/apache/spark-docker
- Add "Spark Docker
<https://issues.apache.org/jira/browse/SPARK-40969?jql=project%20%3D%20SPARK%20AND%20component%20%3D%20%22Spark%20Docker%22>"
component label in JIRA
- Uploaded 3.3.0/3.3.1 dockerfiles: spark-docker#2
<https://github.com/apache/spark-docker/pull/2> spark-docker#20
<https://github.com/apache/spark-docker/pull/20>
- Some fixes apply to dockerfiles to meet the DOI qualities requirements:
  * spark-docker#11 <https://github.com/apache/spark-docker/pull/11> Use
spark as username in official image (instead of magic number 185),
  * spark-docker#14 <https://github.com/apache/spark-docker/pull/14>  Cleanup
os download list cache to reduce image size.
  * spark-docker#17 <https://github.com/apache/spark-docker/pull/17> Remove
pip/setuptools dynamic upgrade to ensure image's repeatability
- Support dockerfile template to help generate all kinds of Dockerfiles for
specific version spark-docker#12
<https://github.com/apache/spark-docker/pull/12>
- Add workflow to help build/test dockerfile to ensure the Dockerfile's
quality
  * K8s Integration test spark-docker#9
<https://github.com/apache/spark-docker/pull/9>
  * Standalone test spark-docker#21
<https://github.com/apache/spark-docker/pull/21> (Great job by @dcoliversun)
- spark-website#424 <https://github.com/apache/spark-website/pull/424> Use
docker image in the example of SQL/Scala/Java
- INFRA-23882 <https://issues.apache.org/jira/browse/INFRA-23882> Add
Docker hub secrets to spark-docker repo to help publish docker hub image

## Not merged yet:
- spark-docker#23 <https://github.com/apache/spark-docker/pull/23> One
click to publish "apache/spark" image
  instead of the current Spark Docker Images publish step
<https://github.com/wangyum/spark-website/blob/1c6b2ee13a1e22748ed416c5cc260c33795a76c8/release-process.md#create-and-upload-spark-docker-images>.
It will also run K8s IT /standalone test first then publish.
- docker-library/official-images#13089
<https://github.com/docker-library/official-images/pull/13089> Add Apache
Spark Docker Official Image,
  waiting for review from docker side.

After the above work, I think we almost reached the quality of DOI (might
have some small fix according to docker
side review in future maybe), but limited by the docker side review
bandwith. The good news is that the PR are in
the top of the review queue according to review history.


# Next step?

Should we publish the apache/spark image (3.3.0/3.3.1) according to
new rules now?

After publish, the apache/spark will add several new tags for v3.3.0 and
v3.3.1 like:

- apache/spark:python3
- apache/spark:scala
- apache/spark:r
- apache/spark all in one
* You can see the complete tag info in here
<https://github.com/apache/spark-docker/pull/23/files#diff-2b39d33506bc7a34cef4b9ebf4cf8b1e3a5532f2131ceb37011b94261cec5f8c>
.

WDYT?

Regards,
Yikun

Reply via email to