(spark-docker) branch master updated: [SPARK-48664][FOLLOWUP] Update 4.0.0-preview1
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new b1b1019 [SPARK-48664][FOLLOWUP] Update 4.0.0-preview1 b1b1019 is described below commit b1b1019fb8f3034d2b31c407eee1a95301cd4eab Author: Yikun Jiang AuthorDate: Mon Jun 24 09:05:02 2024 +0800 [SPARK-48664][FOLLOWUP] Update 4.0.0-preview1 ### What changes were proposed in this pull request? - Remove `wget -nv -O KEYS https://downloads.apache.org/spark/KEYS` and `gpg --import KEYS;`, it works but not meet the [security concern](https://github.com/docker-library/official-images?tab=readme-ov-file#security) from DOI. - add `add-dockerfiles.sh 4.0.0-preview1` support to address https://github.com/apache/spark-docker/pull/61#issuecomment-2178827705 - Fix `versions.json`: consider java17 as default version, so we can remove java17 tag. ### Why are the changes needed? update 4.0.0-preview1 ### Does this PR introduce _any_ user-facing change? new release ### How was this patch tested? ./add-dockerfiles.sh 4.0.0-preview1 , no diff Closes #63 from Yikun/4.0.0-preview1. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- 4.0.0-preview1/scala2.13-java17-ubuntu/Dockerfile | 2 - README.md | 49 ++ add-dockerfiles.sh| 50 +++ versions.json | 9 ++-- 4 files changed, 86 insertions(+), 24 deletions(-) diff --git a/4.0.0-preview1/scala2.13-java17-ubuntu/Dockerfile b/4.0.0-preview1/scala2.13-java17-ubuntu/Dockerfile index 913f2ad..1102caf 100644 --- a/4.0.0-preview1/scala2.13-java17-ubuntu/Dockerfile +++ b/4.0.0-preview1/scala2.13-java17-ubuntu/Dockerfile @@ -46,8 +46,6 @@ RUN set -ex; \ wget -nv -O spark.tgz "$SPARK_TGZ_URL"; \ wget -nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; \ export GNUPGHOME="$(mktemp -d)"; \ -wget -nv -O KEYS https://downloads.apache.org/spark/KEYS; \ -gpg --import KEYS; \ gpg --batch --keyserver hkps://keys.openpgp.org --recv-key "$GPG_KEY" || \ gpg --batch --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY"; \ gpg --batch --verify spark.tgz.asc spark.tgz; \ diff --git a/README.md b/README.md index 87286dc..f34328b 100644 --- a/README.md +++ b/README.md @@ -11,6 +11,55 @@ and Structured Streaming for stream processing. https://spark.apache.org/ +## Create a new version + +### Step 1 Add dockerfiles for a new version. + +You can see [3.4.0 PR](https://github.com/apache/spark-docker/pull/33) as reference. + +- 1.1 Add gpg key to [tools/template.py](https://github.com/apache/spark-docker/blob/master/tools/template.py#L24) + +This gpg key will be used by Dockerfiles (such as [3.4.0](https://github.com/apache/spark-docker/blob/04e85239a8fcc9b3dcfe146bc144ee2b981f8f42/3.4.0/scala2.12-java11-ubuntu/Dockerfile#L41)) to verify the signature of the Apache Spark tarball. + +- 1.2 Add image build workflow (such as [3.4.0 yaml](https://github.com/apache/spark-docker/blob/04e85239a8fcc9b3dcfe146bc144ee2b981f8f42/.github/workflows/build_3.4.0.yaml)) + +This file will be used by GitHub Actions to build the Docker image when you submit the PR to make sure dockerfiles are correct and pass all tests (build/standalone/kubernetes). + +- 1.3 Using `./add-dockerfiles.sh [version]` to add Dockerfiles. + +You will get a new directory with the Dockerfiles for the specified version. + +- 1.4 Add version and tag info to versions.json, publish.yml and test.yml. + +This version file will be used by image build workflow (such as [3.4.0](https://github.com/apache/spark-docker/commit/47c357a52625f482b8b0cb831ccb8c9df523affd) reference) and docker official image. + +### Step 2. Publish apache/spark Images. + +Click [Publish (Java 17 only)](https://github.com/apache/spark-docker/actions/workflows/publish-java17.yaml) (such as 4.x) or [Publish](https://github.com/apache/spark-docker/actions/workflows/publish.yml) (such as 3.x) to publish images. + +After this, the [apache/spark](https://hub.docker.com/r/apache/spark) docker images will be published. + + +### Step 3. Publish spark Docker Official Images. + +Submit the PR to [docker-library/official-images](https://github.com/docker-library/official-images/), see (link)[https://github.com/docker-library/official-images/pull/15363] as reference. + +You can type `tools/manifest.py manifest` to generate the content. + +After this, the [spark](https://hub.docker.com/_/spark) docker images will be published. + +## About images + +| | Apache Spark Image
(spark-docker) branch master updated: [SPARK-47206][FOLLOWUP] Fix wrong path version
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 4f2d96a [SPARK-47206][FOLLOWUP] Fix wrong path version 4f2d96a is described below commit 4f2d96a415c89cfe0fde89a55e9034d095224c94 Author: Yikun Jiang AuthorDate: Thu Feb 29 09:49:01 2024 +0800 [SPARK-47206][FOLLOWUP] Fix wrong path version ### What changes were proposed in this pull request? Fix wrong path version. ### Why are the changes needed? This will be used by https://github.com/docker-library/official-images . ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? ``` $ tools/manifest.py manifest Maintainers: Apache Spark Developers (ApacheSpark) GitRepo: https://github.com/apache/spark-docker.git Tags: 3.5.1-scala2.12-java17-python3-ubuntu, 3.5.1-java17-python3, 3.5.1-java17, python3-java17 Architectures: amd64, arm64v8 GitCommit: 8b4329162bbbd1ce5c9d885a1edcd6d61ebcc676 Directory: ./3.5.1/scala2.12-java17-python3-ubuntu Tags: 3.5.1-scala2.12-java17-r-ubuntu, 3.5.1-java17-r Architectures: amd64, arm64v8 GitCommit: 8b4329162bbbd1ce5c9d885a1edcd6d61ebcc676 Directory: ./3.5.1/scala2.12-java17-r-ubuntu Tags: 3.5.1-scala2.12-java17-ubuntu, 3.5.1-java17-scala Architectures: amd64, arm64v8 GitCommit: 8b4329162bbbd1ce5c9d885a1edcd6d61ebcc676 Directory: ./3.5.1/scala2.12-java17-ubuntu Tags: 3.5.1-scala2.12-java17-python3-r-ubuntu Architectures: amd64, arm64v8 GitCommit: 8b4329162bbbd1ce5c9d885a1edcd6d61ebcc676 Directory: ./3.5.1/scala2.12-java17-python3-r-ubuntu Tags: 3.5.1-scala2.12-java11-python3-ubuntu, 3.5.1-python3, 3.5.1, python3, latest Architectures: amd64, arm64v8 GitCommit: 8b4329162bbbd1ce5c9d885a1edcd6d61ebcc676 Directory: ./3.5.1/scala2.12-java11-python3-ubuntu Tags: 3.5.1-scala2.12-java11-r-ubuntu, 3.5.1-r, r Architectures: amd64, arm64v8 GitCommit: 8b4329162bbbd1ce5c9d885a1edcd6d61ebcc676 Directory: ./3.5.1/scala2.12-java11-r-ubuntu Tags: 3.5.1-scala2.12-java11-ubuntu, 3.5.1-scala, scala Architectures: amd64, arm64v8 GitCommit: 8b4329162bbbd1ce5c9d885a1edcd6d61ebcc676 Directory: ./3.5.1/scala2.12-java11-ubuntu Tags: 3.5.1-scala2.12-java11-python3-r-ubuntu Architectures: amd64, arm64v8 GitCommit: 8b4329162bbbd1ce5c9d885a1edcd6d61ebcc676 Directory: ./3.5.1/scala2.12-java11-python3-r-ubuntu ``` Closes #60 from Yikun/3.5.1-follow. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- versions.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/versions.json b/versions.json index 3d3e3b9..6ea6d71 100644 --- a/versions.json +++ b/versions.json @@ -30,7 +30,7 @@ ] }, { - "path": "3.5.0/scala2.12-java11-python3-ubuntu", + "path": "3.5.1/scala2.12-java11-python3-ubuntu", "tags": [ "3.5.1-scala2.12-java11-python3-ubuntu", "3.5.1-python3", - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark-docker) branch master updated: [SPARK-47206] Add official image Dockerfile for Apache Spark 3.5.1
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 7216374 [SPARK-47206] Add official image Dockerfile for Apache Spark 3.5.1 7216374 is described below commit 7216374855ba57ce14c8ddbf56890538f678ec3d Author: Yikun Jiang AuthorDate: Thu Feb 29 08:55:47 2024 +0800 [SPARK-47206] Add official image Dockerfile for Apache Spark 3.5.1 ### What changes were proposed in this pull request? Add Apache Spark 3.5.1 Dockerfiles. - Add 3.5.1 GPG key - Add .github/workflows/build_3.5.1.yaml - `./add-dockerfiles.sh 3.5.1` to generate dockerfiles - Add version and tag info ### Why are the changes needed? Apache Spark 3.5.1 released ### Does this PR introduce _any_ user-facing change? Docker image will be published. ### How was this patch tested? Add workflow and CI passed Closes #59 from Yikun/3.5.1. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- .github/workflows/build_3.5.1.yaml | 43 +++ .github/workflows/publish.yml | 4 +- .github/workflows/test.yml | 3 +- 3.5.1/scala2.12-java11-python3-r-ubuntu/Dockerfile | 29 + 3.5.1/scala2.12-java11-python3-ubuntu/Dockerfile | 26 + 3.5.1/scala2.12-java11-r-ubuntu/Dockerfile | 28 + 3.5.1/scala2.12-java11-ubuntu/Dockerfile | 79 + 3.5.1/scala2.12-java11-ubuntu/entrypoint.sh| 130 + 3.5.1/scala2.12-java17-python3-r-ubuntu/Dockerfile | 29 + 3.5.1/scala2.12-java17-python3-ubuntu/Dockerfile | 26 + 3.5.1/scala2.12-java17-r-ubuntu/Dockerfile | 28 + 3.5.1/scala2.12-java17-ubuntu/Dockerfile | 79 + 3.5.1/scala2.12-java17-ubuntu/entrypoint.sh| 130 + tools/template.py | 4 +- versions.json | 74 ++-- 15 files changed, 699 insertions(+), 13 deletions(-) diff --git a/.github/workflows/build_3.5.1.yaml b/.github/workflows/build_3.5.1.yaml new file mode 100644 index 000..65a8d5d --- /dev/null +++ b/.github/workflows/build_3.5.1.yaml @@ -0,0 +1,43 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +name: "Build and Test (3.5.1)" + +on: + pull_request: +branches: + - 'master' +paths: + - '3.5.1/**' + +jobs: + run-build: +strategy: + matrix: +image-type: ["all", "python", "scala", "r"] +java: [11, 17] +name: Run +secrets: inherit +uses: ./.github/workflows/main.yml +with: + spark: 3.5.1 + scala: 2.12 + java: ${{ matrix.java }} + image-type: ${{ matrix.image-type }} + diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml index 2f828a4..5dfc210 100644 --- a/.github/workflows/publish.yml +++ b/.github/workflows/publish.yml @@ -25,10 +25,10 @@ on: spark: description: 'The Spark version of Spark image.' required: true -default: '3.5.0' +default: '3.5.1' type: choice options: -- 3.5.0 +- 3.5.1 publish: description: 'Publish the image or not.' default: false diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index df79364..9c08b33 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -25,9 +25,10 @@ on: spark: description: 'The Spark version of Spark image.' required: true -default: '3.5.0' +default: '3.5.1' type: choice options: +- 3.5.1 - 3.5.0 - 3.4.2 - 3.4.1 diff --git a/3.5.1/scala2.12-java11-python3-r-ubuntu/Dockerfile b/3.5.1/scala2.12-java11-python3-r-ubuntu/Dockerfile new file mode 100644 index 000..57c044b --- /dev/null +++ b/3.5.1/scala2.12-java11-python3-r-ubuntu/Dockerfile @@ -0,0 +1,29 @@ +# +# Licensed to the Apa
(spark-docker) branch master updated: [SPARK-46209] Add java 11 only yml for version before 3.5
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 431aa51 [SPARK-46209] Add java 11 only yml for version before 3.5 431aa51 is described below commit 431aa516ba58985c902bf2d2a07bf0eaa1df6740 Author: Yikun Jiang AuthorDate: Sat Dec 2 20:36:29 2023 +0800 [SPARK-46209] Add java 11 only yml for version before 3.5 ### What changes were proposed in this pull request? Add Java11 only workflow for version before 3.5.0. ### Why are the changes needed? otherwise, the publish will failed due to no java 17 file founded in version before v 3.5.0. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Test on my repo: https://github.com/Yikun/spark-docker/actions/workflows/publish-java11.yml Closes #58 from Yikun/java11-publish. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- .github/workflows/{publish.yml => publish-java11.yml} | 9 - .github/workflows/publish.yml | 7 --- 2 files changed, 4 insertions(+), 12 deletions(-) diff --git a/.github/workflows/publish.yml b/.github/workflows/publish-java11.yml similarity index 96% copy from .github/workflows/publish.yml copy to .github/workflows/publish-java11.yml index ec0d66c..caa3702 100644 --- a/.github/workflows/publish.yml +++ b/.github/workflows/publish-java11.yml @@ -17,7 +17,7 @@ # under the License. # -name: "Publish" +name: "Publish (Java 11 only)" on: workflow_dispatch: @@ -25,10 +25,9 @@ on: spark: description: 'The Spark version of Spark image.' required: true -default: '3.5.0' +default: '3.4.2' type: choice options: -- 3.5.0 - 3.4.2 - 3.4.1 - 3.4.0 @@ -59,7 +58,7 @@ jobs: strategy: matrix: scala: [2.12] -java: [11, 17] +java: [11] image-type: ["scala"] permissions: packages: write @@ -81,7 +80,7 @@ jobs: strategy: matrix: scala: [2.12] -java: [11, 17] +java: [11] image-type: ["all", "python", "r"] permissions: packages: write diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml index ec0d66c..2f828a4 100644 --- a/.github/workflows/publish.yml +++ b/.github/workflows/publish.yml @@ -29,13 +29,6 @@ on: type: choice options: - 3.5.0 -- 3.4.2 -- 3.4.1 -- 3.4.0 -- 3.3.3 -- 3.3.2 -- 3.3.1 -- 3.3.0 publish: description: 'Publish the image or not.' default: false - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
(spark-docker) branch master updated: [SPARK-46185] Add official image Dockerfile for Apache Spark 3.4.2
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new ec69b9c [SPARK-46185] Add official image Dockerfile for Apache Spark 3.4.2 ec69b9c is described below commit ec69b9c77bc733ed5937f5068d23f7407eb51ea9 Author: Yikun Jiang AuthorDate: Sat Dec 2 10:00:48 2023 +0800 [SPARK-46185] Add official image Dockerfile for Apache Spark 3.4.2 ### What changes were proposed in this pull request? Add Apache Spark 3.4.2 Dockerfiles. - Add 3.4.2 GPG key - Add .github/workflows/build_3.4.2.yaml - `./add-dockerfiles.sh 3.4.2` to generate dockerfiles (and remove master changes: https://github.com/apache/spark-docker/pull/55/commits/24cbf40abdc252fdcf48303efa33ba7f84adefaf) - Add version and tag info ### Why are the changes needed? Apache Spark 3.4.2 released ### Does this PR introduce _any_ user-facing change? Docker image will be published. ### How was this patch tested? Add workflow and CI passed Closes #57 from Yikun/3.4.2. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- .github/workflows/build_3.4.2.yaml | 41 +++ .github/workflows/publish.yml | 1 + .github/workflows/test.yml | 1 + 3.4.2/scala2.12-java11-python3-r-ubuntu/Dockerfile | 29 + 3.4.2/scala2.12-java11-python3-ubuntu/Dockerfile | 26 + 3.4.2/scala2.12-java11-r-ubuntu/Dockerfile | 28 + 3.4.2/scala2.12-java11-ubuntu/Dockerfile | 79 + 3.4.2/scala2.12-java11-ubuntu/entrypoint.sh| 126 + tools/template.py | 2 + versions.json | 28 + 10 files changed, 361 insertions(+) diff --git a/.github/workflows/build_3.4.2.yaml b/.github/workflows/build_3.4.2.yaml new file mode 100644 index 000..8ae17d1 --- /dev/null +++ b/.github/workflows/build_3.4.2.yaml @@ -0,0 +1,41 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +name: "Build and Test (3.4.2)" + +on: + pull_request: +branches: + - 'master' +paths: + - '3.4.2/**' + +jobs: + run-build: +strategy: + matrix: +image-type: ["all", "python", "scala", "r"] +name: Run +secrets: inherit +uses: ./.github/workflows/main.yml +with: + spark: 3.4.2 + scala: 2.12 + java: 11 + image-type: ${{ matrix.image-type }} diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml index 879a9c2..ec0d66c 100644 --- a/.github/workflows/publish.yml +++ b/.github/workflows/publish.yml @@ -29,6 +29,7 @@ on: type: choice options: - 3.5.0 +- 3.4.2 - 3.4.1 - 3.4.0 - 3.3.3 diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index 689981a..df79364 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -29,6 +29,7 @@ on: type: choice options: - 3.5.0 +- 3.4.2 - 3.4.1 - 3.4.0 - 3.3.3 diff --git a/3.4.2/scala2.12-java11-python3-r-ubuntu/Dockerfile b/3.4.2/scala2.12-java11-python3-r-ubuntu/Dockerfile new file mode 100644 index 000..7c7e96a --- /dev/null +++ b/3.4.2/scala2.12-java11-python3-r-ubuntu/Dockerfile @@ -0,0 +1,29 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS"
(spark-docker) branch master updated: Add support for java 17 from spark 3.5.0
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 6f68fe0 Add support for java 17 from spark 3.5.0 6f68fe0 is described below commit 6f68fe0f7051c10f2bf43a50a7decfce2e97baf0 Author: vakarisbk AuthorDate: Fri Nov 10 11:33:39 2023 +0800 Add support for java 17 from spark 3.5.0 ### What changes were proposed in this pull request? 1. Create Java17 base images alongside Java11 images starting from spark 3.5.0 2. Change ubuntu version to 22.04 for `scala2.12-java17-*` ### Why are the changes needed? Spark supports multiple Java versions, but the images are currently built only with Java 11. ### Does this PR introduce _any_ user-facing change? New images would be available in the repositories. ### How was this patch tested? Closes #56 from vakarisbk/master. Authored-by: vakarisbk Signed-off-by: Yikun Jiang --- .github/workflows/build_3.5.0.yaml | 3 +- .github/workflows/main.yml | 20 +++- .github/workflows/publish.yml | 4 +- .github/workflows/test.yml | 3 + 3.5.0/scala2.12-java17-python3-r-ubuntu/Dockerfile | 29 + 3.5.0/scala2.12-java17-python3-ubuntu/Dockerfile | 26 + 3.5.0/scala2.12-java17-r-ubuntu/Dockerfile | 28 + 3.5.0/scala2.12-java17-ubuntu/Dockerfile | 79 + 3.5.0/scala2.12-java17-ubuntu/entrypoint.sh| 130 + add-dockerfiles.sh | 23 +++- tools/ci_runner_cleaner/free_disk_space.sh | 53 + .../ci_runner_cleaner/free_disk_space_container.sh | 33 ++ tools/template.py | 2 +- versions.json | 29 + 14 files changed, 454 insertions(+), 8 deletions(-) diff --git a/.github/workflows/build_3.5.0.yaml b/.github/workflows/build_3.5.0.yaml index 6eb3ad6..9f2b2d6 100644 --- a/.github/workflows/build_3.5.0.yaml +++ b/.github/workflows/build_3.5.0.yaml @@ -31,11 +31,12 @@ jobs: strategy: matrix: image-type: ["all", "python", "scala", "r"] +java: [11, 17] name: Run secrets: inherit uses: ./.github/workflows/main.yml with: spark: 3.5.0 scala: 2.12 - java: 11 + java: ${{ matrix.java }} image-type: ${{ matrix.image-type }} diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index fe755ed..145b529 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -79,6 +79,14 @@ jobs: - name: Checkout Spark Docker repository uses: actions/checkout@v3 + - name: Free up disk space +shell: 'script -q -e -c "bash {0}"' +run: | + chmod +x tools/ci_runner_cleaner/free_disk_space_container.sh + tools/ci_runner_cleaner/free_disk_space_container.sh + chmod +x tools/ci_runner_cleaner/free_disk_space.sh + tools/ci_runner_cleaner/free_disk_space.sh + - name: Prepare - Generate tags run: | case "${{ inputs.image-type }}" in @@ -195,7 +203,8 @@ jobs: - name : Test - Run spark application for standalone cluster on docker run: testing/run_tests.sh --image-url $IMAGE_URL --scala-version ${{ inputs.scala }} --spark-version ${{ inputs.spark }} - - name: Test - Checkout Spark repository + - name: Test - Checkout Spark repository for Spark 3.3.0 (with fetch-depth 0) +if: inputs.spark == '3.3.0' uses: actions/checkout@v3 with: fetch-depth: 0 @@ -203,6 +212,14 @@ jobs: ref: v${{ inputs.spark }} path: ${{ github.workspace }}/spark + - name: Test - Checkout Spark repository +if: inputs.spark != '3.3.0' +uses: actions/checkout@v3 +with: + repository: apache/spark + ref: v${{ inputs.spark }} + path: ${{ github.workspace }}/spark + - name: Test - Cherry pick commits # Apache Spark enable resource limited k8s IT since v3.3.1, cherry-pick patches for old release # https://github.com/apache/spark/pull/36087#issuecomment-1251756266 @@ -247,6 +264,7 @@ jobs: # TODO(SPARK-44495): Resume to use the latest minikube for k8s-integration-tests. curl -LO https://storage.googleapis.com/minikube/releases/v1.30.1/minikube-linux-amd64 sudo install minikube-linux-amd64 /usr/local/bin/minikube + rm minikube-linux-amd64 # Github Action limit cpu:2, memory: 6947MB, limit to 2U6G for better resource statistic minikube start --cpus 2 --memory 6144 diff --git a/.github
[spark-docker] branch master updated: [SPARK-45169] Add official image Dockerfile for Apache Spark 3.5.0
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 028efd4 [SPARK-45169] Add official image Dockerfile for Apache Spark 3.5.0 028efd4 is described below commit 028efd4637fb2cf791d5bd9ea70b2fca472de4b7 Author: Yikun Jiang AuthorDate: Thu Sep 14 21:22:32 2023 +0800 [SPARK-45169] Add official image Dockerfile for Apache Spark 3.5.0 ### What changes were proposed in this pull request? Add Apache Spark 3.5.0 Dockerfiles. - Add 3.5.0 GPG key - Add .github/workflows/build_3.5.0.yaml - `./add-dockerfiles.sh 3.5.0` to generate dockerfiles - Add version and tag info - Backport https://github.com/apache/spark/commit/1d2c338c867c69987d8ed1f3666358af54a040e3 and https://github.com/apache/spark/commit/0c7b4306c7c5fbdd6c54f8172f82e1d23e3b entrypoint changes ### Why are the changes needed? Apache Spark 3.5.0 released ### Does this PR introduce _any_ user-facing change? Docker image will be published. ### How was this patch tested? Add workflow and CI passed Closes #55 from Yikun/3.5.0. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- .github/workflows/build_3.5.0.yaml | 41 +++ .github/workflows/publish.yml | 3 +- .github/workflows/test.yml | 3 +- 3.5.0/scala2.12-java11-python3-r-ubuntu/Dockerfile | 29 3.5.0/scala2.12-java11-python3-ubuntu/Dockerfile | 26 +++ 3.5.0/scala2.12-java11-r-ubuntu/Dockerfile | 28 3.5.0/scala2.12-java11-ubuntu/Dockerfile | 79 ++ .../scala2.12-java11-ubuntu/entrypoint.sh | 4 ++ entrypoint.sh.template | 4 ++ tools/template.py | 4 +- versions.json | 42 ++-- 11 files changed, 253 insertions(+), 10 deletions(-) diff --git a/.github/workflows/build_3.5.0.yaml b/.github/workflows/build_3.5.0.yaml new file mode 100644 index 000..6eb3ad6 --- /dev/null +++ b/.github/workflows/build_3.5.0.yaml @@ -0,0 +1,41 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +name: "Build and Test (3.5.0)" + +on: + pull_request: +branches: + - 'master' +paths: + - '3.5.0/**' + +jobs: + run-build: +strategy: + matrix: +image-type: ["all", "python", "scala", "r"] +name: Run +secrets: inherit +uses: ./.github/workflows/main.yml +with: + spark: 3.5.0 + scala: 2.12 + java: 11 + image-type: ${{ matrix.image-type }} diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml index d213ada..8cfa95d 100644 --- a/.github/workflows/publish.yml +++ b/.github/workflows/publish.yml @@ -25,9 +25,10 @@ on: spark: description: 'The Spark version of Spark image.' required: true -default: '3.4.1' +default: '3.5.0' type: choice options: +- 3.5.0 - 3.4.1 - 3.4.0 - 3.3.3 diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index 4f0f741..47dac20 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -25,9 +25,10 @@ on: spark: description: 'The Spark version of Spark image.' required: true -default: '3.4.1' +default: '3.5.0' type: choice options: +- 3.5.0 - 3.4.1 - 3.4.0 - 3.3.3 diff --git a/3.5.0/scala2.12-java11-python3-r-ubuntu/Dockerfile b/3.5.0/scala2.12-java11-python3-r-ubuntu/Dockerfile new file mode 100644 index 000..d6faaa7 --- /dev/null +++ b/3.5.0/scala2.12-java11-python3-r-ubuntu/Dockerfile @@ -0,0 +1,29 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership.
[spark-docker] branch master updated: [SPARK-44494] Pin minikube to v1.30.1 to fix spark-docker K8s CI
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 6fd201e [SPARK-44494] Pin minikube to v1.30.1 to fix spark-docker K8s CI 6fd201e is described below commit 6fd201e7c6e6a36c7a18e3b5877c3616081a05cf Author: Yikun Jiang AuthorDate: Thu Aug 17 15:30:59 2023 +0800 [SPARK-44494] Pin minikube to v1.30.1 to fix spark-docker K8s CI ### What changes were proposed in this pull request? Pin minikube to v1.30.1 to fix spark-docker K8s CI. ### Why are the changes needed? Pin minikube to v1.30.1 to fix spark-docker K8s CI ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Closes #53 from Yikun/minikube. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- .github/workflows/main.yml | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 870c8c7..fe755ed 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -243,7 +243,9 @@ jobs: - name: Test - Start minikube run: | # See more in "Installation" https://minikube.sigs.k8s.io/docs/start/ - curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 + # curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 + # TODO(SPARK-44495): Resume to use the latest minikube for k8s-integration-tests. + curl -LO https://storage.googleapis.com/minikube/releases/v1.30.1/minikube-linux-amd64 sudo install minikube-linux-amd64 /usr/local/bin/minikube # Github Action limit cpu:2, memory: 6947MB, limit to 2U6G for better resource statistic minikube start --cpus 2 --memory 6144 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-docker] branch master updated: [SPARK-40513] Add --batch to gpg command
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 58d2885 [SPARK-40513] Add --batch to gpg command 58d2885 is described below commit 58d288546e8419d229f14b62b6a653999e0390f1 Author: Yikun Jiang AuthorDate: Thu Jun 29 16:05:47 2023 +0800 [SPARK-40513] Add --batch to gpg command ### What changes were proposed in this pull request? Add --batch to gpg command which essentially puts GnuPG into "API mode" instead of "UI mode". Apply changes to 3.4.x dockerfile. ### Why are the changes needed? Address DOI comments: https://github.com/docker-library/official-images/pull/13089#issuecomment-1611814491 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Closes #51 from Yikun/batch. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- 3.4.0/scala2.12-java11-ubuntu/Dockerfile | 4 ++-- 3.4.1/scala2.12-java11-ubuntu/Dockerfile | 4 ++-- Dockerfile.template | 4 ++-- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/3.4.0/scala2.12-java11-ubuntu/Dockerfile b/3.4.0/scala2.12-java11-ubuntu/Dockerfile index 854f86c..a4b081e 100644 --- a/3.4.0/scala2.12-java11-ubuntu/Dockerfile +++ b/3.4.0/scala2.12-java11-ubuntu/Dockerfile @@ -46,8 +46,8 @@ RUN set -ex; \ wget -nv -O spark.tgz "$SPARK_TGZ_URL"; \ wget -nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; \ export GNUPGHOME="$(mktemp -d)"; \ -gpg --keyserver hkps://keys.openpgp.org --recv-key "$GPG_KEY" || \ -gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY"; \ +gpg --batch --keyserver hkps://keys.openpgp.org --recv-key "$GPG_KEY" || \ +gpg --batch --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY"; \ gpg --batch --verify spark.tgz.asc spark.tgz; \ gpgconf --kill all; \ rm -rf "$GNUPGHOME" spark.tgz.asc; \ diff --git a/3.4.1/scala2.12-java11-ubuntu/Dockerfile b/3.4.1/scala2.12-java11-ubuntu/Dockerfile index 6d62769..d8bba7e 100644 --- a/3.4.1/scala2.12-java11-ubuntu/Dockerfile +++ b/3.4.1/scala2.12-java11-ubuntu/Dockerfile @@ -46,8 +46,8 @@ RUN set -ex; \ wget -nv -O spark.tgz "$SPARK_TGZ_URL"; \ wget -nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; \ export GNUPGHOME="$(mktemp -d)"; \ -gpg --keyserver hkps://keys.openpgp.org --recv-key "$GPG_KEY" || \ -gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY"; \ +gpg --batch --keyserver hkps://keys.openpgp.org --recv-key "$GPG_KEY" || \ +gpg --batch --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY"; \ gpg --batch --verify spark.tgz.asc spark.tgz; \ gpgconf --kill all; \ rm -rf "$GNUPGHOME" spark.tgz.asc; \ diff --git a/Dockerfile.template b/Dockerfile.template index 80b57e2..3d0aacf 100644 --- a/Dockerfile.template +++ b/Dockerfile.template @@ -46,8 +46,8 @@ RUN set -ex; \ wget -nv -O spark.tgz "$SPARK_TGZ_URL"; \ wget -nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; \ export GNUPGHOME="$(mktemp -d)"; \ -gpg --keyserver hkps://keys.openpgp.org --recv-key "$GPG_KEY" || \ -gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY"; \ +gpg --batch --keyserver hkps://keys.openpgp.org --recv-key "$GPG_KEY" || \ +gpg --batch --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY"; \ gpg --batch --verify spark.tgz.asc spark.tgz; \ gpgconf --kill all; \ rm -rf "$GNUPGHOME" spark.tgz.asc; \ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-docker] branch master updated: [SPARK-44168][FOLLOWUP] Change v3.4 GPG_KEY to full key fingerprint
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 39264c5 [SPARK-44168][FOLLOWUP] Change v3.4 GPG_KEY to full key fingerprint 39264c5 is described below commit 39264c502cf21b71a1ab5da71760e5864abce099 Author: Yikun Jiang AuthorDate: Thu Jun 29 16:04:50 2023 +0800 [SPARK-44168][FOLLOWUP] Change v3.4 GPG_KEY to full key fingerprint ### What changes were proposed in this pull request? Change GPG key from `34F0FC5C` to `F28C9C925C188C35E345614DEDA00CE834F0FC5C` to avoid pontential collision. The full finger print can get from below cmd: ``` $ wget https://dist.apache.org/repos/dist/dev/spark/KEYS $ gpg --import KEYS $ gpg --fingerprint 34F0FC5C pub rsa4096 2015-05-05 [SC] F28C 9C92 5C18 8C35 E345 614D EDA0 0CE8 34F0 FC5C uid [ unknown] Dongjoon Hyun (CODE SIGNING KEY) sub rsa4096 2015-05-05 [E] ``` ### Why are the changes needed? - A short gpg key had been added as v3.4.0 gpg key in https://github.com/apache/spark-docker/pull/46 . - The short key `34F0FC5C` is from https://dist.apache.org/repos/dist/dev/spark/KEYS - According DOI review comments, https://github.com/docker-library/official-images/pull/13089#issuecomment-1609990551 , `this should be the full key fingerprint: F28C9C925C188C35E345614DEDA00CE834F0FC5C (generating a collision for such a short key ID is trivial.` - We'd better to switch the short key to full fingerprint ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Closes #50 from Yikun/gpg_key. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- 3.4.1/scala2.12-java11-ubuntu/Dockerfile | 2 +- tools/template.py| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/3.4.1/scala2.12-java11-ubuntu/Dockerfile b/3.4.1/scala2.12-java11-ubuntu/Dockerfile index bf106a6..6d62769 100644 --- a/3.4.1/scala2.12-java11-ubuntu/Dockerfile +++ b/3.4.1/scala2.12-java11-ubuntu/Dockerfile @@ -38,7 +38,7 @@ RUN set -ex; \ # https://downloads.apache.org/spark/KEYS ENV SPARK_TGZ_URL=https://archive.apache.org/dist/spark/spark-3.4.1/spark-3.4.1-bin-hadoop3.tgz \ SPARK_TGZ_ASC_URL=https://archive.apache.org/dist/spark/spark-3.4.1/spark-3.4.1-bin-hadoop3.tgz.asc \ -GPG_KEY=34F0FC5C +GPG_KEY=F28C9C925C188C35E345614DEDA00CE834F0FC5C RUN set -ex; \ export SPARK_TMP="$(mktemp -d)"; \ diff --git a/tools/template.py b/tools/template.py index 93e842a..cdc167c 100755 --- a/tools/template.py +++ b/tools/template.py @@ -31,7 +31,7 @@ GPG_KEY_DICT = { # issuer "xinr...@apache.org" "3.4.0": "CC68B3D16FE33A766705160BA7E57908C7A4E1B1", # issuer "dongj...@apache.org" -"3.4.1": "34F0FC5C" +"3.4.1": "F28C9C925C188C35E345614DEDA00CE834F0FC5C" } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-docker] branch master updated: [SPARK-40513][DOCS] Add apache/spark docker image overview
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new d02ff60 [SPARK-40513][DOCS] Add apache/spark docker image overview d02ff60 is described below commit d02ff6091835311a32c7ccc73d8ebae1d5817ecc Author: Yikun Jiang AuthorDate: Tue Jun 27 14:28:21 2023 +0800 [SPARK-40513][DOCS] Add apache/spark docker image overview ### What changes were proposed in this pull request? This PR add the `OVERVIEW.md`. ### Why are the changes needed? This will be used in the page of https://hub.docker.com/r/apache/spark to introduce the spark docker image and tag info. ### Does this PR introduce _any_ user-facing change? Yes, doc only ### How was this patch tested? Doc only, review. Closes #34 from Yikun/overview. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- OVERVIEW.md | 83 + 1 file changed, 83 insertions(+) diff --git a/OVERVIEW.md b/OVERVIEW.md new file mode 100644 index 000..046 --- /dev/null +++ b/OVERVIEW.md @@ -0,0 +1,83 @@ +# What is Apache Spark™? + +Apache Spark™ is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, pandas API on Spark for pandas workloads, MLlib for machine learning, GraphX for graph processing, and Structu [...] + +https://spark.apache.org/ + +## Online Documentation + +You can find the latest Spark documentation, including a programming guide, on the [project web page](https://spark.apache.org/documentation.html). This README file only contains basic setup instructions. + +## Interactive Scala Shell + +The easiest way to start using Spark is through the Scala shell: + +``` +docker run -it apache/spark /opt/spark/bin/spark-shell +``` + +Try the following command, which should return 1,000,000,000: + +``` +scala> spark.range(1000 * 1000 * 1000).count() +``` + +## Interactive Python Shell + +The easiest way to start using PySpark is through the Python shell: + +``` +docker run -it apache/spark /opt/spark/bin/pyspark +``` + +And run the following command, which should also return 1,000,000,000: + +``` +>>> spark.range(1000 * 1000 * 1000).count() +``` + +## Interactive R Shell + +The easiest way to start using R on Spark is through the R shell: + +``` +docker run -it apache/spark:r /opt/spark/bin/sparkR +``` + +## Running Spark on Kubernetes + +https://spark.apache.org/docs/latest/running-on-kubernetes.html + +## Supported tags and respective Dockerfile links + +Currently, the `apache/spark` docker image supports 4 types for each version: + +Such as for v3.4.0: +- [3.4.0-scala2.12-java11-python3-ubuntu, 3.4.0-python3, 3.4.0, python3, latest](https://github.com/apache/spark-docker/tree/fe05e38f0ffad271edccd6ae40a77d5f14f3eef7/3.4.0/scala2.12-java11-python3-ubuntu) +- [3.4.0-scala2.12-java11-r-ubuntu, 3.4.0-r, r](https://github.com/apache/spark-docker/tree/fe05e38f0ffad271edccd6ae40a77d5f14f3eef7/3.4.0/scala2.12-java11-r-ubuntu) +- [3.4.0-scala2.12-java11-ubuntu, 3.4.0-scala, scala](https://github.com/apache/spark-docker/tree/fe05e38f0ffad271edccd6ae40a77d5f14f3eef7/3.4.0/scala2.12-java11-ubuntu) +- [3.4.0-scala2.12-java11-python3-r-ubuntu](https://github.com/apache/spark-docker/tree/fe05e38f0ffad271edccd6ae40a77d5f14f3eef7/3.4.0/scala2.12-java11-python3-r-ubuntu) + +## Environment Variable + +The environment variables of entrypoint.sh are listed below: + +| Environment Variable | Meaning | +|--|---| +| SPARK_EXTRA_CLASSPATH | The extra path to be added to the classpath, see also in https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management | +| PYSPARK_PYTHON | Python binary executable to use for PySpark in both driver and workers (default is python3 if available, otherwise python). Property spark.pyspark.python take precedence if it is set | +| PYSPARK_DRIVER_PYTHON | Python binary executable to use for PySpark in driver only (default is PYSPARK_PYTHON). Property spark.pyspark.driver.python take precedence if it is set | +| SPARK_DIST_CLASSPATH | Distribution-defined classpath to add to processes | +| SPARK_DRIVER_BIND_ADDRESS | Hostname or IP address where to bind listening sockets. See also `spark.driver.bindAddress` | +| SPARK_EXECUTOR_JAVA_OPTS | The Java opts of Spark Executor | +| SPARK_APPLICATION_ID | A unique identifier for the Spark application | +| SPARK_EXECUTOR_POD_IP | The Pod IP address of spark executor | +| SPARK_RESOURC
[spark-docker] branch master updated: [SPARK-44175] Remove useless lib64 path link in dockerfile
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 5405b49 [SPARK-44175] Remove useless lib64 path link in dockerfile 5405b49 is described below commit 5405b49b52aa1661d31ac80cdb8c9aad530d6847 Author: Yikun Jiang AuthorDate: Tue Jun 27 14:09:34 2023 +0800 [SPARK-44175] Remove useless lib64 path link in dockerfile ### What changes were proposed in this pull request? Remove useless lib64 path ### Why are the changes needed? Address comments: https://github.com/docker-library/official-images/pull/13089#issuecomment-1601813499 It was introduced by https://github.com/apache/spark/commit/f13ea15d79fb4752a0a75a05a4a89bd8625ea3d5 to address the issue about snappy on alpine OS, but we already switch the OS to ubuntu, so `/lib64` hack can be cleanup. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Closes #48 from Yikun/rm-lib64-hack. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- 3.4.0/scala2.12-java11-ubuntu/Dockerfile | 1 - 3.4.1/scala2.12-java11-ubuntu/Dockerfile | 1 - Dockerfile.template | 1 - 3 files changed, 3 deletions(-) diff --git a/3.4.0/scala2.12-java11-ubuntu/Dockerfile b/3.4.0/scala2.12-java11-ubuntu/Dockerfile index 77ace47..854f86c 100644 --- a/3.4.0/scala2.12-java11-ubuntu/Dockerfile +++ b/3.4.0/scala2.12-java11-ubuntu/Dockerfile @@ -23,7 +23,6 @@ RUN groupadd --system --gid=${spark_uid} spark && \ RUN set -ex; \ apt-get update; \ -ln -s /lib /lib64; \ apt-get install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user libnss3 procps net-tools gosu libnss-wrapper; \ mkdir -p /opt/spark; \ mkdir /opt/spark/python; \ diff --git a/3.4.1/scala2.12-java11-ubuntu/Dockerfile b/3.4.1/scala2.12-java11-ubuntu/Dockerfile index e782686..bf106a6 100644 --- a/3.4.1/scala2.12-java11-ubuntu/Dockerfile +++ b/3.4.1/scala2.12-java11-ubuntu/Dockerfile @@ -23,7 +23,6 @@ RUN groupadd --system --gid=${spark_uid} spark && \ RUN set -ex; \ apt-get update; \ -ln -s /lib /lib64; \ apt-get install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user libnss3 procps net-tools gosu libnss-wrapper; \ mkdir -p /opt/spark; \ mkdir /opt/spark/python; \ diff --git a/Dockerfile.template b/Dockerfile.template index 6fedce9..80b57e2 100644 --- a/Dockerfile.template +++ b/Dockerfile.template @@ -23,7 +23,6 @@ RUN groupadd --system --gid=${spark_uid} spark && \ RUN set -ex; \ apt-get update; \ -ln -s /lib /lib64; \ apt-get install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user libnss3 procps net-tools gosu libnss-wrapper; \ mkdir -p /opt/spark; \ mkdir /opt/spark/python; \ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-docker] branch master updated: [SPARK-44177] Add 'set -eo pipefail' to entrypoint and quote variables
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 6022289 [SPARK-44177] Add 'set -eo pipefail' to entrypoint and quote variables 6022289 is described below commit 60222892836549f05c56edd49ac81c688c8e7356 Author: Yikun Jiang AuthorDate: Tue Jun 27 08:59:03 2023 +0800 [SPARK-44177] Add 'set -eo pipefail' to entrypoint and quote variables ### What changes were proposed in this pull request? Add 'set -eo pipefail' to entrypoint and quote variables ### Why are the changes needed? Address DOI comments: 1. Have you considered a set -eo pipefail on the entrypoint script to help prevent any errors from being silently ignored? 2. You probably want to quote this (and many of the other variables in this execution); ala --driver-url "$SPARK_DRIVER_URL" [1] https://github.com/docker-library/official-images/pull/13089#issuecomment-1601334895 [2] https://github.com/docker-library/official-images/pull/13089#issuecomment-1601813499 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Closes #49 from Yikun/quote. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- 3.4.0/scala2.12-java11-ubuntu/entrypoint.sh | 31 - 3.4.1/scala2.12-java11-ubuntu/entrypoint.sh | 31 - entrypoint.sh.template | 31 - 3 files changed, 51 insertions(+), 42 deletions(-) diff --git a/3.4.0/scala2.12-java11-ubuntu/entrypoint.sh b/3.4.0/scala2.12-java11-ubuntu/entrypoint.sh index 08fc925..2e3d2a8 100755 --- a/3.4.0/scala2.12-java11-ubuntu/entrypoint.sh +++ b/3.4.0/scala2.12-java11-ubuntu/entrypoint.sh @@ -15,6 +15,9 @@ # See the License for the specific language governing permissions and # limitations under the License. # +# Prevent any errors from being silently ignored +set -eo pipefail + attempt_setup_fake_passwd_entry() { # Check whether there is a passwd entry for the container UID local myuid; myuid="$(id -u)" @@ -51,10 +54,10 @@ if [ -n "$SPARK_EXTRA_CLASSPATH" ]; then SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_EXTRA_CLASSPATH" fi -if ! [ -z ${PYSPARK_PYTHON+x} ]; then +if ! [ -z "${PYSPARK_PYTHON+x}" ]; then export PYSPARK_PYTHON fi -if ! [ -z ${PYSPARK_DRIVER_PYTHON+x} ]; then +if ! [ -z "${PYSPARK_DRIVER_PYTHON+x}" ]; then export PYSPARK_DRIVER_PYTHON fi @@ -64,13 +67,13 @@ if [ -n "${HADOOP_HOME}" ] && [ -z "${SPARK_DIST_CLASSPATH}" ]; then export SPARK_DIST_CLASSPATH="$($HADOOP_HOME/bin/hadoop classpath)" fi -if ! [ -z ${HADOOP_CONF_DIR+x} ]; then +if ! [ -z "${HADOOP_CONF_DIR+x}" ]; then SPARK_CLASSPATH="$HADOOP_CONF_DIR:$SPARK_CLASSPATH"; fi -if ! [ -z ${SPARK_CONF_DIR+x} ]; then +if ! [ -z "${SPARK_CONF_DIR+x}" ]; then SPARK_CLASSPATH="$SPARK_CONF_DIR:$SPARK_CLASSPATH"; -elif ! [ -z ${SPARK_HOME+x} ]; then +elif ! [ -z "${SPARK_HOME+x}" ]; then SPARK_CLASSPATH="$SPARK_HOME/conf:$SPARK_CLASSPATH"; fi @@ -99,17 +102,17 @@ case "$1" in CMD=( ${JAVA_HOME}/bin/java "${SPARK_EXECUTOR_JAVA_OPTS[@]}" - -Xms$SPARK_EXECUTOR_MEMORY - -Xmx$SPARK_EXECUTOR_MEMORY + -Xms"$SPARK_EXECUTOR_MEMORY" + -Xmx"$SPARK_EXECUTOR_MEMORY" -cp "$SPARK_CLASSPATH:$SPARK_DIST_CLASSPATH" org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBackend - --driver-url $SPARK_DRIVER_URL - --executor-id $SPARK_EXECUTOR_ID - --cores $SPARK_EXECUTOR_CORES - --app-id $SPARK_APPLICATION_ID - --hostname $SPARK_EXECUTOR_POD_IP - --resourceProfileId $SPARK_RESOURCE_PROFILE_ID - --podName $SPARK_EXECUTOR_POD_NAME + --driver-url "$SPARK_DRIVER_URL" + --executor-id "$SPARK_EXECUTOR_ID" + --cores "$SPARK_EXECUTOR_CORES" + --app-id "$SPARK_APPLICATION_ID" + --hostname "$SPARK_EXECUTOR_POD_IP" + --resourceProfileId "$SPARK_RESOURCE_PROFILE_ID" + --podName "$SPARK_EXECUTOR_POD_NAME" ) attempt_setup_fake_passwd_entry # Execute the container CMD under tini for better hygiene diff --git a/3.4.1/scala2.12-java11-ubuntu/entrypoint.sh b/3.4.1/scala2.12-java11-ubuntu/entrypoint.sh index 08fc925..2e3d2a8 100755 --- a/3.4.1/scala2.12-java11-ubuntu/entrypoint.sh +++ b/3.4.1/scala2.12-java11-ubuntu/entrypoint.sh @@ -15,6 +15,9 @@ # See the License for the specific language governing permissions and # limitations under the License. # +# Prevent any errors fr
[spark-docker] branch master updated: [SPARK-44176] Change apt to apt-get and remove useless cleanup
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 6f1a0a5 [SPARK-44176] Change apt to apt-get and remove useless cleanup 6f1a0a5 is described below commit 6f1a0a5fbb8034ebc4ea04e4f0b2fda728a4dd1e Author: Yikun Jiang AuthorDate: Tue Jun 27 08:56:54 2023 +0800 [SPARK-44176] Change apt to apt-get and remove useless cleanup ### What changes were proposed in this pull request? This patch change `apt` to `apt-get` and also remove useless `rm -rf /var/cache/apt/*; \`. And also apply the change to 3.4.0 and 3.4.1 ### Why are the changes needed? Address comments from DOI: - `apt install ...`, This should be apt-get (apt is not intended for unattended use, as the warning during build makes clear). - `rm -rf /var/cache/apt/*; \` This is harmless, but should be unnecessary (the base image configuration already makes sure this directory stays empty). See more in: [1] https://github.com/docker-library/official-images/pull/13089#issuecomment-1601813499 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Closes #47 from Yikun/apt-get. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- 3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile | 5 ++--- 3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile | 3 +-- 3.4.0/scala2.12-java11-r-ubuntu/Dockerfile | 3 +-- 3.4.0/scala2.12-java11-ubuntu/Dockerfile | 3 +-- 3.4.1/scala2.12-java11-python3-r-ubuntu/Dockerfile | 5 ++--- 3.4.1/scala2.12-java11-python3-ubuntu/Dockerfile | 3 +-- 3.4.1/scala2.12-java11-r-ubuntu/Dockerfile | 3 +-- 3.4.1/scala2.12-java11-ubuntu/Dockerfile | 3 +-- Dockerfile.template| 3 +-- r-python.template | 5 ++--- 10 files changed, 13 insertions(+), 23 deletions(-) diff --git a/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile b/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile index 0f1962f..10aa23e 100644 --- a/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile +++ b/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile @@ -20,9 +20,8 @@ USER root RUN set -ex; \ apt-get update; \ -apt install -y python3 python3-pip; \ -apt install -y r-base r-base-dev; \ -rm -rf /var/cache/apt/*; \ +apt-get install -y python3 python3-pip; \ +apt-get install -y r-base r-base-dev; \ rm -rf /var/lib/apt/lists/* ENV R_HOME /usr/lib/R diff --git a/3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile b/3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile index 258d806..3240e57 100644 --- a/3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile +++ b/3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile @@ -20,8 +20,7 @@ USER root RUN set -ex; \ apt-get update; \ -apt install -y python3 python3-pip; \ -rm -rf /var/cache/apt/*; \ +apt-get install -y python3 python3-pip; \ rm -rf /var/lib/apt/lists/* USER spark diff --git a/3.4.0/scala2.12-java11-r-ubuntu/Dockerfile b/3.4.0/scala2.12-java11-r-ubuntu/Dockerfile index 4c928c6..266392f 100644 --- a/3.4.0/scala2.12-java11-r-ubuntu/Dockerfile +++ b/3.4.0/scala2.12-java11-r-ubuntu/Dockerfile @@ -20,8 +20,7 @@ USER root RUN set -ex; \ apt-get update; \ -apt install -y r-base r-base-dev; \ -rm -rf /var/cache/apt/*; \ +apt-get install -y r-base r-base-dev; \ rm -rf /var/lib/apt/lists/* ENV R_HOME /usr/lib/R diff --git a/3.4.0/scala2.12-java11-ubuntu/Dockerfile b/3.4.0/scala2.12-java11-ubuntu/Dockerfile index aa754b7..77ace47 100644 --- a/3.4.0/scala2.12-java11-ubuntu/Dockerfile +++ b/3.4.0/scala2.12-java11-ubuntu/Dockerfile @@ -24,7 +24,7 @@ RUN groupadd --system --gid=${spark_uid} spark && \ RUN set -ex; \ apt-get update; \ ln -s /lib /lib64; \ -apt install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user libnss3 procps net-tools gosu libnss-wrapper; \ +apt-get install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user libnss3 procps net-tools gosu libnss-wrapper; \ mkdir -p /opt/spark; \ mkdir /opt/spark/python; \ mkdir -p /opt/spark/examples; \ @@ -33,7 +33,6 @@ RUN set -ex; \ touch /opt/spark/RELEASE; \ chown -R spark:spark /opt/spark; \ echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su; \ -rm -rf /var/cache/apt/*; \ rm -rf /var/lib/apt/lists/* # Install Apache Spark diff --git a/3.4.1/scala2.12-java11-python3-r-ubuntu/Dockerfile b/3.4.1/scala2.12-java11-python3-r-ubuntu/Dockerfile index 95c98b9..30e6b86 100644 --- a/3.4.1/scala2.12-java11-python3-r-ubuntu/Dockerfile +++ b/3.4.1/scala2.12-java11-python3-r-ubuntu/Dockerfile @@ -20,9 +20,8 @@ USER root RUN set -ex; \ apt-get updat
[spark-docker] branch master updated: [SPARK-44168] Add Apache Spark 3.4.1 Dockerfiles
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 6f36415 [SPARK-44168] Add Apache Spark 3.4.1 Dockerfiles 6f36415 is described below commit 6f3641534a97a80491cba926cc7a5e67972494ea Author: Yikun Jiang AuthorDate: Sun Jun 25 10:51:46 2023 +0800 [SPARK-44168] Add Apache Spark 3.4.1 Dockerfiles ### What changes were proposed in this pull request? Add Apache Spark 3.4.1 Dockerfiles. - Add 3.4.1 GPG key - Add .github/workflows/build_3.4.1.yaml - ./add-dockerfiles.sh 3.4.1 - Add version and tag info ### Why are the changes needed? Apache Spark 3.4.1 released: https://spark.apache.org/releases/spark-release-3-4-1.html ### Does this PR introduce _any_ user-facing change? Docker image will be published. ### How was this patch tested? Add workflow and CI passed Closes #46 from Yikun/3.4.1. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- .github/workflows/build_3.4.1.yaml | 41 +++ .github/workflows/publish.yml | 3 +- .github/workflows/test.yml | 3 +- 3.4.1/scala2.12-java11-python3-r-ubuntu/Dockerfile | 30 + 3.4.1/scala2.12-java11-python3-ubuntu/Dockerfile | 27 + 3.4.1/scala2.12-java11-r-ubuntu/Dockerfile | 29 + 3.4.1/scala2.12-java11-ubuntu/Dockerfile | 81 ++ 3.4.1/scala2.12-java11-ubuntu/entrypoint.sh| 123 + tools/template.py | 2 + versions.json | 42 +-- 10 files changed, 372 insertions(+), 9 deletions(-) diff --git a/.github/workflows/build_3.4.1.yaml b/.github/workflows/build_3.4.1.yaml new file mode 100644 index 000..2eba18e --- /dev/null +++ b/.github/workflows/build_3.4.1.yaml @@ -0,0 +1,41 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +name: "Build and Test (3.4.1)" + +on: + pull_request: +branches: + - 'master' +paths: + - '3.4.1/**' + +jobs: + run-build: +strategy: + matrix: +image-type: ["all", "python", "scala", "r"] +name: Run +secrets: inherit +uses: ./.github/workflows/main.yml +with: + spark: 3.4.1 + scala: 2.12 + java: 11 + image-type: ${{ matrix.image-type }} diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml index 3063bfe..1138a9f 100644 --- a/.github/workflows/publish.yml +++ b/.github/workflows/publish.yml @@ -25,9 +25,10 @@ on: spark: description: 'The Spark version of Spark image.' required: true -default: '3.4.0' +default: '3.4.1' type: choice options: +- 3.4.1 - 3.4.0 - 3.3.2 - 3.3.1 diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index 06e2321..4136f1c 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -25,9 +25,10 @@ on: spark: description: 'The Spark version of Spark image.' required: true -default: '3.4.0' +default: '3.4.1' type: choice options: +- 3.4.1 - 3.4.0 - 3.3.2 - 3.3.1 diff --git a/3.4.1/scala2.12-java11-python3-r-ubuntu/Dockerfile b/3.4.1/scala2.12-java11-python3-r-ubuntu/Dockerfile new file mode 100644 index 000..95c98b9 --- /dev/null +++ b/3.4.1/scala2.12-java11-python3-r-ubuntu/Dockerfile @@ -0,0 +1,30 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICEN
[spark-docker] branch master updated: [SPARK-43368] Use `libnss_wrapper` to fake passwd entry
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new c07ae18 [SPARK-43368] Use `libnss_wrapper` to fake passwd entry c07ae18 is described below commit c07ae18355678370fd270bedb8b39ab2aceb5ac2 Author: Yikun Jiang AuthorDate: Fri Jun 2 10:27:01 2023 +0800 [SPARK-43368] Use `libnss_wrapper` to fake passwd entry ### What changes were proposed in this pull request? Use `libnss_wrapper` to fake passwd entry instead of changing passwd to resolve random UID problem. And also we only attempt to setup fake passwd entry for driver/executor, but for cmd like `bash`, the fake passwd will not be set. ### Why are the changes needed? In the past, we add the entry to `/etc/passwd` directly for current UID, it's mainly for [OpenShift anonymous random `uid` case](https://github.com/docker-library/official-images/pull/13089#issuecomment-1534706523) (See also in https://github.com/apache-spark-on-k8s/spark/pull/404), but this way bring the pontential security issue about widely permision of `/etc/passwd`. According to DOI reviewer [suggestion](https://github.com/docker-library/official-images/pull/13089#issuecomment-1561793792), we'd better to resolve this problem by using [libnss_wrapper](https://cwrap.org/nss_wrapper.html). It's a library to help set a fake passwd entry by setting `LD_PRELOAD`, `NSS_WRAPPER_PASSWD`, `NSS_WRAPPER_GROUP`. Such as random UID is `1000`, the env will be: ``` spark6f41b8e5be9b:/opt/spark/work-dir$ id -u 1000 spark6f41b8e5be9b:/opt/spark/work-dir$ id -g 1000 spark6f41b8e5be9b:/opt/spark/work-dir$ whoami spark spark6f41b8e5be9b:/opt/spark/work-dir$ echo $LD_PRELOAD /usr/lib/libnss_wrapper.so spark6f41b8e5be9b:/opt/spark/work-dir$ echo $NSS_WRAPPER_PASSWD /tmp/tmp.r5x4SMX35B spark6f41b8e5be9b:/opt/spark/work-dir$ cat /tmp/tmp.r5x4SMX35B spark:x:1000:1000:${SPARK_USER_NAME:-anonymous uid}:/opt/spark:/bin/false spark6f41b8e5be9b:/opt/spark/work-dir$ echo $NSS_WRAPPER_GROUP /tmp/tmp.XcnnYuD68r spark6f41b8e5be9b:/opt/spark/work-dir$ cat /tmp/tmp.XcnnYuD68r spark:x:1000: ``` ### Does this PR introduce _any_ user-facing change? Yes, setup fake ENV rather than changing `/etc/passwd`. ### How was this patch tested? 1. Without `attempt_setup_fake_passwd_entry`, the user is `I have no name!` ``` # docker run -it --rm --user 1000:1000 spark-test bash groups: cannot find name for group ID 1000 I have no name!998110cd5a26:/opt/spark/work-dir$ I have no name!0fea1d27d67d:/opt/spark/work-dir$ id -u 1000 I have no name!0fea1d27d67d:/opt/spark/work-dir$ id -g 1000 I have no name!0fea1d27d67d:/opt/spark/work-dir$ whoami whoami: cannot find name for user ID 1000 ``` 2. Mannual stub the `attempt_setup_fake_passwd_entry`, the user is `spark`. 2.1 Apply a tmp change to cmd ```patch diff --git a/entrypoint.sh.template b/entrypoint.sh.template index 08fc925..77d5b04 100644 --- a/entrypoint.sh.template +++ b/entrypoint.sh.template -118,6 +118,7 case "$1" in *) # Non-spark-on-k8s command provided, proceeding in pass-through mode... +attempt_setup_fake_passwd_entry exec "$" ;; esac ``` 2.2 Build and run the image, specify a random UID/GID 1000 ```bash $ docker build . -t spark-test $ docker run -it --rm --user 1000:1000 spark-test bash # the user is set to spark rather than unknow user spark6f41b8e5be9b:/opt/spark/work-dir$ spark6f41b8e5be9b:/opt/spark/work-dir$ id -u 1000 spark6f41b8e5be9b:/opt/spark/work-dir$ id -g 1000 spark6f41b8e5be9b:/opt/spark/work-dir$ whoami spark ``` ``` # NSS env is set right spark6f41b8e5be9b:/opt/spark/work-dir$ echo $LD_PRELOAD /usr/lib/libnss_wrapper.so spark6f41b8e5be9b:/opt/spark/work-dir$ echo $NSS_WRAPPER_PASSWD /tmp/tmp.r5x4SMX35B spark6f41b8e5be9b:/opt/spark/work-dir$ cat /tmp/tmp.r5x4SMX35B spark:x:1000:1000:${SPARK_USER_NAME:-anonymous uid}:/opt/spark:/bin/false spark6f41b8e5be9b:/opt/spark/work-dir$ echo $NSS_WRAPPER_GROUP /tmp/tmp.XcnnYuD68r spark6f41b8e5be9b:/opt/spark/work-dir$ cat /tmp/tmp.XcnnYuD68r spark:x:1000: ``` 3. If specify current exsiting user (such as `spark`, `root`), no fake setup ```bash # docker run -it --rm --user 0 spark-test bash roote5bf55d4df22:/opt/spark/work-dir# echo $LD_PRELOAD ``` ```bash # docker run -it --rm spark-test bash sparkdef8d8ca4e7d:/opt/spark/work-dir$ echo $LD_PRELOAD ``` Closes #45 from Yikun/SPARK-43368. Auth
[spark-docker] branch master updated: [SPARK-43370] Switch spark user only when run driver and executor
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 2dc12d9 [SPARK-43370] Switch spark user only when run driver and executor 2dc12d9 is described below commit 2dc12d96910710aa6ee2d717c4c723ddd75127a1 Author: Yikun Jiang AuthorDate: Thu Jun 1 14:36:17 2023 +0800 [SPARK-43370] Switch spark user only when run driver and executor ### What changes were proposed in this pull request? Switch spark user only when run driver and executor ### Why are the changes needed? Address doi comments: question 7 [1] [1] https://github.com/docker-library/official-images/pull/13089#issuecomment-1533540388 [2] https://github.com/docker-library/official-images/pull/13089#issuecomment-1561793792 ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? 1. test mannuly ``` cd ~/spark-docker/3.4.0/scala2.12-java11-ubuntu $ docker build . -t spark-test $ docker run -ti spark-test bash sparkafa78af05cf8:/opt/spark/work-dir$ $ docker run --user root -ti spark-test bash root095e0d7651fd:/opt/spark/work-dir# ``` 2. ci passed Closes: https://github.com/apache/spark-docker/pull/44 Closes #43 from Yikun/SPARK-43370. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- 3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile | 4 3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile | 4 3.4.0/scala2.12-java11-r-ubuntu/Dockerfile | 4 3.4.0/scala2.12-java11-ubuntu/Dockerfile | 2 ++ 3.4.0/scala2.12-java11-ubuntu/entrypoint.sh| 23 +++--- Dockerfile.template| 2 ++ entrypoint.sh.template | 23 +++--- r-python.template | 4 8 files changed, 44 insertions(+), 22 deletions(-) diff --git a/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile b/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile index 7734100..0f1962f 100644 --- a/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile +++ b/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile @@ -16,6 +16,8 @@ # FROM spark:3.4.0-scala2.12-java11-ubuntu +USER root + RUN set -ex; \ apt-get update; \ apt install -y python3 python3-pip; \ @@ -24,3 +26,5 @@ RUN set -ex; \ rm -rf /var/lib/apt/lists/* ENV R_HOME /usr/lib/R + +USER spark diff --git a/3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile b/3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile index 6c12c30..258d806 100644 --- a/3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile +++ b/3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile @@ -16,8 +16,12 @@ # FROM spark:3.4.0-scala2.12-java11-ubuntu +USER root + RUN set -ex; \ apt-get update; \ apt install -y python3 python3-pip; \ rm -rf /var/cache/apt/*; \ rm -rf /var/lib/apt/lists/* + +USER spark diff --git a/3.4.0/scala2.12-java11-r-ubuntu/Dockerfile b/3.4.0/scala2.12-java11-r-ubuntu/Dockerfile index 24cd41a..4c928c6 100644 --- a/3.4.0/scala2.12-java11-r-ubuntu/Dockerfile +++ b/3.4.0/scala2.12-java11-r-ubuntu/Dockerfile @@ -16,6 +16,8 @@ # FROM spark:3.4.0-scala2.12-java11-ubuntu +USER root + RUN set -ex; \ apt-get update; \ apt install -y r-base r-base-dev; \ @@ -23,3 +25,5 @@ RUN set -ex; \ rm -rf /var/lib/apt/lists/* ENV R_HOME /usr/lib/R + +USER spark diff --git a/3.4.0/scala2.12-java11-ubuntu/Dockerfile b/3.4.0/scala2.12-java11-ubuntu/Dockerfile index 205b399..a680106 100644 --- a/3.4.0/scala2.12-java11-ubuntu/Dockerfile +++ b/3.4.0/scala2.12-java11-ubuntu/Dockerfile @@ -77,4 +77,6 @@ ENV SPARK_HOME /opt/spark WORKDIR /opt/spark/work-dir +USER spark + ENTRYPOINT [ "/opt/entrypoint.sh" ] diff --git a/3.4.0/scala2.12-java11-ubuntu/entrypoint.sh b/3.4.0/scala2.12-java11-ubuntu/entrypoint.sh index 716f1af..6def3f9 100755 --- a/3.4.0/scala2.12-java11-ubuntu/entrypoint.sh +++ b/3.4.0/scala2.12-java11-ubuntu/entrypoint.sh @@ -69,6 +69,13 @@ elif ! [ -z ${SPARK_HOME+x} ]; then SPARK_CLASSPATH="$SPARK_HOME/conf:$SPARK_CLASSPATH"; fi +# Switch to spark if no USER specified (root by default) otherwise use USER directly +switch_spark_if_root() { + if [ $(id -u) -eq 0 ]; then +echo gosu spark + fi +} + case "$1" in driver) shift 1 @@ -78,6 +85,8 @@ case "$1" in --deploy-mode client "$@" ) +# Execute the container CMD under tini for better hygiene +exec $(switch_spark_if_root) /usr/bin/tini -s -- "${CMD[@]}" ;; executor) shift 1 @@ -96,20 +105,12 @@ case "$1" in --resourceProfileId $SPARK_RESOURCE_PROFILE_ID --podName $SPARK_EXECUTOR_POD_NAME
[spark-docker] branch master updated: [SPARK-43806] Add awesome-spark-docker.md
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 9d4c98c [SPARK-43806] Add awesome-spark-docker.md 9d4c98c is described below commit 9d4c98c62c4ce517e69e65d1f6f7bf412d775b75 Author: Yikun Jiang AuthorDate: Fri May 26 09:53:20 2023 +0800 [SPARK-43806] Add awesome-spark-docker.md ### What changes were proposed in this pull request? Add links to more related images and dockerfile reference. ### Why are the changes needed? Something we talked about in "Spark on Kube Coffe Chats“[1] to add links to more related images and dockerfile reference. Init with [2]. [1] https://lists.apache.org/thread/26gpmlhqhk5cp2fhtzrpl5f61p8jc551 [2] https://github.com/awesome-spark/awesome-spark/blob/main/README.md#docker-images ### Does this PR introduce _any_ user-facing change? Doc only ### How was this patch tested? No Closes #28 from Yikun/awesome-spark-docker. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- awesome-spark-docker.md | 7 +++ 1 file changed, 7 insertions(+) diff --git a/awesome-spark-docker.md b/awesome-spark-docker.md new file mode 100644 index 000..c7bb840 --- /dev/null +++ b/awesome-spark-docker.md @@ -0,0 +1,7 @@ +A curated list of awesome Apache Spark Docker resources. + +- [jupyter/docker-stacks/pyspark-notebook](https://github.com/jupyter/docker-stacks/tree/master/pyspark-notebook) - PySpark with Jupyter Notebook. +- [big-data-europe/docker-spark](https://github.com/big-data-europe/docker-spark) - The standalone cluster and spark applications related Dockerfiles. +- [openeuler/spark](https://github.com/openeuler-mirror/openeuler-docker-images/tree/master/spark) - Dockerfile reference for dnf/yum based OS. +- [GoogleCloudPlatform/spark-on-k8s-operator](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator) - Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes. + - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-docker] branch master updated: [SPARK-43367] Recover sh in dockerfile
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new ce3e122 [SPARK-43367] Recover sh in dockerfile ce3e122 is described below commit ce3e12266ef82264b814f6f7823165f7c7ae215a Author: Yikun Jiang AuthorDate: Thu May 25 19:07:55 2023 +0800 [SPARK-43367] Recover sh in dockerfile ### What changes were proposed in this pull request? Recover `sh`, we remove `sh` due to https://github.com/apache-spark-on-k8s/spark/pull/444/files#r134075892 , now `SPARK_DRIVER_JAVA_OPTS` related code already move to `entrypoint.sh` with `#!/bin/bash`, so we don't need this hack way. See also: [1] https://github.com/docker-library/official-images/pull/13089#issuecomment-1533540388 [2] https://github.com/docker-library/official-images/pull/13089#issuecomment-1561793792 ### Why are the changes needed? Recover sh ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Closes #41 from Yikun/SPARK-43367. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- 3.4.0/scala2.12-java11-ubuntu/Dockerfile | 2 -- Dockerfile.template | 2 -- 2 files changed, 4 deletions(-) diff --git a/3.4.0/scala2.12-java11-ubuntu/Dockerfile b/3.4.0/scala2.12-java11-ubuntu/Dockerfile index 11f997f..205b399 100644 --- a/3.4.0/scala2.12-java11-ubuntu/Dockerfile +++ b/3.4.0/scala2.12-java11-ubuntu/Dockerfile @@ -32,8 +32,6 @@ RUN set -ex; \ chmod g+w /opt/spark/work-dir; \ touch /opt/spark/RELEASE; \ chown -R spark:spark /opt/spark; \ -rm /bin/sh; \ -ln -sv /bin/bash /bin/sh; \ echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su; \ chgrp root /etc/passwd && chmod ug+rw /etc/passwd; \ rm -rf /var/cache/apt/*; \ diff --git a/Dockerfile.template b/Dockerfile.template index 6e85cd3..8b13e4a 100644 --- a/Dockerfile.template +++ b/Dockerfile.template @@ -32,8 +32,6 @@ RUN set -ex; \ chmod g+w /opt/spark/work-dir; \ touch /opt/spark/RELEASE; \ chown -R spark:spark /opt/spark; \ -rm /bin/sh; \ -ln -sv /bin/bash /bin/sh; \ echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su; \ chgrp root /etc/passwd && chmod ug+rw /etc/passwd; \ rm -rf /var/cache/apt/*; \ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-docker] branch master updated: [SPARK-43793] Fix SPARK_EXECUTOR_JAVA_OPTS assignment bug
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 006e8fa [SPARK-43793] Fix SPARK_EXECUTOR_JAVA_OPTS assignment bug 006e8fa is described below commit 006e8fade69f148a05fc73f591f52c7678e48f04 Author: Yikun Jiang AuthorDate: Thu May 25 19:05:26 2023 +0800 [SPARK-43793] Fix SPARK_EXECUTOR_JAVA_OPTS assignment bug ### What changes were proposed in this pull request? In previous code, this is susceptible to a few bugs particularly around newlines in values. ``` env | grep SPARK_JAVA_OPT_ | sort -t_ -k4 -n | sed 's/[^=]*=\(.*\)/\1/g' > /tmp/java_opts.txt readarray -t SPARK_EXECUTOR_JAVA_OPTS < /tmp/java_opts.txt ``` ### Why are the changes needed? To address DOI comments: https://github.com/docker-library/official-images/pull/13089#issuecomment-1533540388 , question 6. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? 1. Test mannully ``` export SPARK_JAVA_OPT_0="foo=bar" export SPARK_JAVA_OPT_1="foo1=bar1" for v in "${!SPARK_JAVA_OPT_}"; do SPARK_EXECUTOR_JAVA_OPTS+=( "${!v}" ) done for v in ${SPARK_EXECUTOR_JAVA_OPTS[]}; do echo $v done # foo=bar # foo1=bar1 ``` 2. CI passed Closes #42 from Yikun/SPARK-43793. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- 3.4.0/scala2.12-java11-ubuntu/entrypoint.sh | 5 +++-- entrypoint.sh.template | 5 +++-- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/3.4.0/scala2.12-java11-ubuntu/entrypoint.sh b/3.4.0/scala2.12-java11-ubuntu/entrypoint.sh index 4bb1557..716f1af 100755 --- a/3.4.0/scala2.12-java11-ubuntu/entrypoint.sh +++ b/3.4.0/scala2.12-java11-ubuntu/entrypoint.sh @@ -38,8 +38,9 @@ if [ -z "$JAVA_HOME" ]; then fi SPARK_CLASSPATH="$SPARK_CLASSPATH:${SPARK_HOME}/jars/*" -env | grep SPARK_JAVA_OPT_ | sort -t_ -k4 -n | sed 's/[^=]*=\(.*\)/\1/g' > /tmp/java_opts.txt -readarray -t SPARK_EXECUTOR_JAVA_OPTS < /tmp/java_opts.txt +for v in "${!SPARK_JAVA_OPT_@}"; do +SPARK_EXECUTOR_JAVA_OPTS+=( "${!v}" ) +done if [ -n "$SPARK_EXTRA_CLASSPATH" ]; then SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_EXTRA_CLASSPATH" diff --git a/entrypoint.sh.template b/entrypoint.sh.template index 4bb1557..716f1af 100644 --- a/entrypoint.sh.template +++ b/entrypoint.sh.template @@ -38,8 +38,9 @@ if [ -z "$JAVA_HOME" ]; then fi SPARK_CLASSPATH="$SPARK_CLASSPATH:${SPARK_HOME}/jars/*" -env | grep SPARK_JAVA_OPT_ | sort -t_ -k4 -n | sed 's/[^=]*=\(.*\)/\1/g' > /tmp/java_opts.txt -readarray -t SPARK_EXECUTOR_JAVA_OPTS < /tmp/java_opts.txt +for v in "${!SPARK_JAVA_OPT_@}"; do +SPARK_EXECUTOR_JAVA_OPTS+=( "${!v}" ) +done if [ -n "$SPARK_EXTRA_CLASSPATH" ]; then SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_EXTRA_CLASSPATH" - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-docker] branch master updated: [SPARK-43365][FOLLWUP] Refactor publish workflow based on base image
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new f2d2b2d [SPARK-43365][FOLLWUP] Refactor publish workflow based on base image f2d2b2d is described below commit f2d2b2d1ffbb951aed29221a040861327c09441e Author: Yikun Jiang AuthorDate: Thu May 25 16:13:44 2023 +0800 [SPARK-43365][FOLLWUP] Refactor publish workflow based on base image ### What changes were proposed in this pull request? - This patch changes the `build-args` to `patch in test` in build and publish workflow, because the docker official image do not support **parameterized FROM** values. https://github.com/docker-library/official-images/pull/13089#issuecomment-1555352902 - And also Refactor publish workflow: ![image](https://user-images.githubusercontent.com/1736354/236613626-96f8fbf6-7df7-4d10-b4fb-be4d57c56dce.png) ### Why are the changes needed? Same change with build workflow refactor, to avoid the publish issue like: ``` #5 [linux/amd64 internal] load metadata for docker.io/library/spark:3.4.0-scala2.12-java11-ubuntu #5 ERROR: pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed -- > [linux/amd64 internal] load metadata for docker.io/library/spark:3.4.0-scala2.12-java11-ubuntu: -- Dockerfile:18 16 | # 17 | ARG BASE_IMAGE=spark:3.4.0-scala2.12-java11-ubuntu 18 | >>> FROM $BASE_IMAGE 19 | 20 | RUN set -ex && \ ERROR: failed to solve: spark:3.4.0-scala2.12-java11-ubuntu: pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed Error: buildx failed with: ERROR: failed to solve: spark:3.4.0-scala2.12-java11-ubuntu: pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Publish test in my local fork: - https://github.com/Yikun/spark-docker/actions/runs/5076986823/jobs/9120029759: Skip the local base build use the [published base](https://github.com/Yikun/spark-docker/actions/runs/5076986823/jobs/9120029759#step:11:135) image: ![image](https://user-images.githubusercontent.com/1736354/236612540-2b454c14-e194-4d73-b859-0df001570d27.png) ``` #3 [linux/amd64 internal] load metadata for ghcr.io/yikun/spark-docker/spark:3.4.0-scala2.12-java11-ubuntu #3 DONE 0.9s #4 [linux/arm64 internal] load metadata for ghcr.io/yikun/spark-docker/spark:3.4.0-scala2.12-java11-ubuntu #4 DONE 0.9s ``` - CI passed: do local base build first and build base on the local build Closes #39 from Yikun/publish-build. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- .github/workflows/main.yml | 21 -- .github/workflows/publish.yml | 25 +- 3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile | 3 +-- 3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile | 3 +-- 3.4.0/scala2.12-java11-r-ubuntu/Dockerfile | 3 +-- r-python.template | 3 +-- 6 files changed, 47 insertions(+), 11 deletions(-) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index c1d0c56..870c8c7 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -107,6 +107,9 @@ jobs: TEST_REPO=${{ inputs.repository }} UNIQUE_IMAGE_TAG=${{ inputs.image-tag }} fi + + # We can't use the real image for build because we haven't publish the image yet. + # The base image for build, it's something like localhost:5000/$REPO_OWNER/spark-docker/spark:3.3.0-scala2.12-java11-ubuntu BASE_IMAGE_URL=$TEST_REPO/$IMAGE_NAME:$BASE_IMGAE_TAG IMAGE_URL=$TEST_REPO/$IMAGE_NAME:$UNIQUE_IMAGE_TAG @@ -157,7 +160,8 @@ jobs: driver-opts: network=host - name: Build - Build the base image -if: ${{ inputs.build }} +# Don't need to build the base image when publish +if: ${{ inputs.build && !inputs.publish }} uses: docker/build-push-action@v3 with: context: ${{ env.BASE_IMAGE_PATH }} @@ -165,11 +169,24 @@ jobs: platforms: linux/amd64,linux/arm64 push: true + - name: Build - Use the test image repo when build +# Don't need to build the base image when publish +if: ${{ inputs.build && !inputs.publish }} +working-directory
[spark-docker] branch master updated: [SPARK-43372] Use ; instead of && when enable set -ex
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 7f9b414 [SPARK-43372] Use ; instead of && when enable set -ex 7f9b414 is described below commit 7f9b414de48639d69c64acfd81e6792517b86f61 Author: Yikun Jiang AuthorDate: Mon May 8 11:19:36 2023 +0800 [SPARK-43372] Use ; instead of && when enable set -ex ### What changes were proposed in this pull request? - Use ; instead of && when enable set -ex - ./add-dockerfiles.sh 3.4.0 to apply changes ### Why are the changes needed? Address DOI comments: `9. using set -ex means you can use ; instead of && (really only matters for complex expressions, like the || in the later RUN that does use ;)` https://github.com/docker-library/official-images/pull/13089#issuecomment-1533540388 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Closes #38 from Yikun/SPARK-43372. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- 3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile | 10 +++ 3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile | 8 +++--- 3.4.0/scala2.12-java11-r-ubuntu/Dockerfile | 8 +++--- 3.4.0/scala2.12-java11-ubuntu/Dockerfile | 32 +++--- Dockerfile.template| 32 +++--- r-python.template | 10 +++ 6 files changed, 50 insertions(+), 50 deletions(-) diff --git a/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile b/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile index 86337c5..12c7a4f 100644 --- a/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile +++ b/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile @@ -17,11 +17,11 @@ ARG BASE_IMAGE=spark:3.4.0-scala2.12-java11-ubuntu FROM $BASE_IMAGE -RUN set -ex && \ -apt-get update && \ -apt install -y python3 python3-pip && \ -apt install -y r-base r-base-dev && \ -rm -rf /var/cache/apt/* && \ +RUN set -ex; \ +apt-get update; \ +apt install -y python3 python3-pip; \ +apt install -y r-base r-base-dev; \ +rm -rf /var/cache/apt/*; \ rm -rf /var/lib/apt/lists/* ENV R_HOME /usr/lib/R diff --git a/3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile b/3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile index 540805f..1f0dd1f 100644 --- a/3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile +++ b/3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile @@ -17,8 +17,8 @@ ARG BASE_IMAGE=spark:3.4.0-scala2.12-java11-ubuntu FROM $BASE_IMAGE -RUN set -ex && \ -apt-get update && \ -apt install -y python3 python3-pip && \ -rm -rf /var/cache/apt/* && \ +RUN set -ex; \ +apt-get update; \ +apt install -y python3 python3-pip; \ +rm -rf /var/cache/apt/*; \ rm -rf /var/lib/apt/lists/* diff --git a/3.4.0/scala2.12-java11-r-ubuntu/Dockerfile b/3.4.0/scala2.12-java11-r-ubuntu/Dockerfile index c65c2ce..53647b2 100644 --- a/3.4.0/scala2.12-java11-r-ubuntu/Dockerfile +++ b/3.4.0/scala2.12-java11-r-ubuntu/Dockerfile @@ -17,10 +17,10 @@ ARG BASE_IMAGE=spark:3.4.0-scala2.12-java11-ubuntu FROM $BASE_IMAGE -RUN set -ex && \ -apt-get update && \ -apt install -y r-base r-base-dev && \ -rm -rf /var/cache/apt/* && \ +RUN set -ex; \ +apt-get update; \ +apt install -y r-base r-base-dev; \ +rm -rf /var/cache/apt/*; \ rm -rf /var/lib/apt/lists/* ENV R_HOME /usr/lib/R diff --git a/3.4.0/scala2.12-java11-ubuntu/Dockerfile b/3.4.0/scala2.12-java11-ubuntu/Dockerfile index 21d95d4..11f997f 100644 --- a/3.4.0/scala2.12-java11-ubuntu/Dockerfile +++ b/3.4.0/scala2.12-java11-ubuntu/Dockerfile @@ -21,22 +21,22 @@ ARG spark_uid=185 RUN groupadd --system --gid=${spark_uid} spark && \ useradd --system --uid=${spark_uid} --gid=spark spark -RUN set -ex && \ -apt-get update && \ -ln -s /lib /lib64 && \ -apt install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user libnss3 procps net-tools gosu && \ -mkdir -p /opt/spark && \ -mkdir /opt/spark/python && \ -mkdir -p /opt/spark/examples && \ -mkdir -p /opt/spark/work-dir && \ -chmod g+w /opt/spark/work-dir && \ -touch /opt/spark/RELEASE && \ -chown -R spark:spark /opt/spark && \ -rm /bin/sh && \ -ln -sv /bin/bash /bin/sh && \ -echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \ -chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \ -rm -rf /var/cache/apt/* && \ +RUN set -ex; \ +
[spark-docker] branch master updated: [SPARK-43371] Minimize duplication across layers for chmod
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 406eb86 [SPARK-43371] Minimize duplication across layers for chmod 406eb86 is described below commit 406eb86c2cc722458e0a4787e759802dda5c73eb Author: Yikun Jiang AuthorDate: Sat May 6 17:24:12 2023 +0800 [SPARK-43371] Minimize duplication across layers for chmod ### What changes were proposed in this pull request? This patch minimizes duplication across layers for chmod: - Move `chmod g+w /opt/spark/work-dir` to layer of `/opt/spark/work-dir` creation - Move `chmod a+x /opt/decom.sh` to layer of spark extration layer. - `chmod a+x $VERSION/$TAG/entrypoint.sh` when generate the entrypoint.sh - ./add-dockerfiles.sh 3.4.0 to apply changes ### Why are the changes needed? Address DOI review comments to minimize duplication across layers for chmod > To minimize duplication across layers, chmod's should be done in the layer that creates the file/folder (or in the case of a file from the context via COPY, it should have the +x committed to git) https://github.com/docker-library/official-images/pull/13089#issuecomment-1533540388 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Closes #37 from Yikun/SPARK-43371. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- 3.4.0/scala2.12-java11-ubuntu/Dockerfile| 5 ++--- 3.4.0/scala2.12-java11-ubuntu/entrypoint.sh | 0 Dockerfile.template | 5 ++--- add-dockerfiles.sh | 1 + 4 files changed, 5 insertions(+), 6 deletions(-) diff --git a/3.4.0/scala2.12-java11-ubuntu/Dockerfile b/3.4.0/scala2.12-java11-ubuntu/Dockerfile index 997b8d3..21d95d4 100644 --- a/3.4.0/scala2.12-java11-ubuntu/Dockerfile +++ b/3.4.0/scala2.12-java11-ubuntu/Dockerfile @@ -29,6 +29,7 @@ RUN set -ex && \ mkdir /opt/spark/python && \ mkdir -p /opt/spark/examples && \ mkdir -p /opt/spark/work-dir && \ +chmod g+w /opt/spark/work-dir && \ touch /opt/spark/RELEASE && \ chown -R spark:spark /opt/spark && \ rm /bin/sh && \ @@ -68,6 +69,7 @@ RUN set -ex; \ mv python/pyspark /opt/spark/python/pyspark/; \ mv python/lib /opt/spark/python/lib/; \ mv R /opt/spark/; \ +chmod a+x /opt/decom.sh; \ cd ..; \ rm -rf "$SPARK_TMP"; @@ -76,8 +78,5 @@ COPY entrypoint.sh /opt/ ENV SPARK_HOME /opt/spark WORKDIR /opt/spark/work-dir -RUN chmod g+w /opt/spark/work-dir -RUN chmod a+x /opt/decom.sh -RUN chmod a+x /opt/entrypoint.sh ENTRYPOINT [ "/opt/entrypoint.sh" ] diff --git a/3.4.0/scala2.12-java11-ubuntu/entrypoint.sh b/3.4.0/scala2.12-java11-ubuntu/entrypoint.sh old mode 100644 new mode 100755 diff --git a/Dockerfile.template b/Dockerfile.template index 5fe4f25..db01a87 100644 --- a/Dockerfile.template +++ b/Dockerfile.template @@ -29,6 +29,7 @@ RUN set -ex && \ mkdir /opt/spark/python && \ mkdir -p /opt/spark/examples && \ mkdir -p /opt/spark/work-dir && \ +chmod g+w /opt/spark/work-dir && \ touch /opt/spark/RELEASE && \ chown -R spark:spark /opt/spark && \ rm /bin/sh && \ @@ -68,6 +69,7 @@ RUN set -ex; \ mv python/pyspark /opt/spark/python/pyspark/; \ mv python/lib /opt/spark/python/lib/; \ mv R /opt/spark/; \ +chmod a+x /opt/decom.sh; \ cd ..; \ rm -rf "$SPARK_TMP"; @@ -76,8 +78,5 @@ COPY entrypoint.sh /opt/ ENV SPARK_HOME /opt/spark WORKDIR /opt/spark/work-dir -RUN chmod g+w /opt/spark/work-dir -RUN chmod a+x /opt/decom.sh -RUN chmod a+x /opt/entrypoint.sh ENTRYPOINT [ "/opt/entrypoint.sh" ] diff --git a/add-dockerfiles.sh b/add-dockerfiles.sh index 7dcd7b0..d61601e 100755 --- a/add-dockerfiles.sh +++ b/add-dockerfiles.sh @@ -52,6 +52,7 @@ for TAG in $TAGS; do if [ "$TAG" == "scala2.12-java11-ubuntu" ]; then python3 tools/template.py $OPTS > $VERSION/$TAG/Dockerfile python3 tools/template.py $OPTS -f entrypoint.sh.template > $VERSION/$TAG/entrypoint.sh +chmod a+x $VERSION/$TAG/entrypoint.sh else python3 tools/template.py $OPTS -f r-python.template > $VERSION/$TAG/Dockerfile fi - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-docker] branch master updated: [SPARK-43365] Refactor Dockerfile and workflow based on base image
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 7f83637 [SPARK-43365] Refactor Dockerfile and workflow based on base image 7f83637 is described below commit 7f836378d8bfe453b7e1dba304b54cb1cfacda49 Author: Yikun Jiang AuthorDate: Sat May 6 09:15:41 2023 +0800 [SPARK-43365] Refactor Dockerfile and workflow based on base image ### What changes were proposed in this pull request? This PR changes Dockerfile and workflow based on base image to save space by sharing layers by having one image from another. After this PR: - The spark / PySpark / SparkR related files extract into base image - Install PySpark / SparkR deps in PySpark / SparkR images. - Add the base image build step - Apply changes to template: `./add-dockerfiles.sh 3.4.0` to make it work. - This PR didn't contain changes on 3.3.X Dockerfiles to make PR more clear, the 3.3.x changes will be a separate PR when we address all comments for 3.4.0. [1] https://github.com/docker-library/official-images/pull/13089?notification_referrer_id=NT_kwDOABp-orI0MzIwMzMwNzY5OjE3MzYzNTQ#issuecomment-1533540388 ### Why are the changes needed? Address DOI comments, and also to save space by sharing layers by having one image from another. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed. Closes #36 from Yikun/official. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- .github/workflows/main.yml | 20 3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile | 63 +--- .../entrypoint.sh | 114 - 3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile | 63 +--- .../scala2.12-java11-python3-ubuntu/entrypoint.sh | 114 - 3.4.0/scala2.12-java11-r-ubuntu/Dockerfile | 60 +-- 3.4.0/scala2.12-java11-r-ubuntu/entrypoint.sh | 107 --- 3.4.0/scala2.12-java11-ubuntu/Dockerfile | 4 + 3.4.0/scala2.12-java11-ubuntu/entrypoint.sh| 7 ++ Dockerfile.template| 15 --- add-dockerfiles.sh | 9 +- entrypoint.sh.template | 2 - add-dockerfiles.sh => r-python.template| 54 +++--- tools/template.py | 16 +++ 14 files changed, 77 insertions(+), 571 deletions(-) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index fd37990..c1d0c56 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -91,10 +91,12 @@ jobs: scala) SUFFIX=ubuntu ;; esac + BASE_IMGAE_TAG=${{ inputs.spark }}-scala${{ inputs.scala }}-java${{ inputs.java }}-ubuntu TAG=scala${{ inputs.scala }}-java${{ inputs.java }}-$SUFFIX IMAGE_NAME=spark IMAGE_PATH=${{ inputs.spark }}/$TAG + BASE_IMAGE_PATH=${{ inputs.spark }}/scala${{ inputs.scala }}-java${{ inputs.java }}-ubuntu if [ "${{ inputs.build }}" == "true" ]; then # Use the local registry to build and test REPO_OWNER=$(echo "${{ github.repository_owner }}" | tr '[:upper:]' '[:lower:]') @@ -105,6 +107,7 @@ jobs: TEST_REPO=${{ inputs.repository }} UNIQUE_IMAGE_TAG=${{ inputs.image-tag }} fi + BASE_IMAGE_URL=$TEST_REPO/$IMAGE_NAME:$BASE_IMGAE_TAG IMAGE_URL=$TEST_REPO/$IMAGE_NAME:$UNIQUE_IMAGE_TAG PUBLISH_REPO=${{ inputs.repository }} @@ -116,8 +119,12 @@ jobs: echo "TEST_REPO=${TEST_REPO}" >> $GITHUB_ENV # Image name: spark echo "IMAGE_NAME=${IMAGE_NAME}" >> $GITHUB_ENV + # Base Image Dockerfile: 3.3.0/scala2.12-java11-ubuntu + echo "BASE_IMAGE_PATH=${BASE_IMAGE_PATH}" >> $GITHUB_ENV # Image dockerfile path: 3.3.0/scala2.12-java11-python3-ubuntu echo "IMAGE_PATH=${IMAGE_PATH}" >> $GITHUB_ENV + # Base Image URL: spark:3.3.0-scala2.12-java11-ubuntu + echo "BASE_IMAGE_URL=${BASE_IMAGE_URL}" >> $GITHUB_ENV # Image URL: ghcr.io/apache/spark-docker/spark:3.3.0-scala2.12-java11-python3-ubuntu echo "IMAGE_URL=${IMAGE_URL}" >> $GITHUB_ENV @@ -132,6 +139,9 @@ jobs: echo "IMAGE_PATH: "${IMAGE_PATH} echo "IMAGE_URL: "${IMAGE_URL} + echo "BASE_IMAGE_PATH: "${BASE_IMAGE_PATH} + echo "BASE_IMAGE_URL: "${BASE_IMAGE_URL} +
[spark-docker] branch master updated: [SPARK-43148] Add Apache Spark 3.4.0 Dockerfiles
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new fe05e38 [SPARK-43148] Add Apache Spark 3.4.0 Dockerfiles fe05e38 is described below commit fe05e38f0ffad271edccd6ae40a77d5f14f3eef7 Author: Yikun Jiang AuthorDate: Tue Apr 18 10:58:59 2023 +0800 [SPARK-43148] Add Apache Spark 3.4.0 Dockerfiles ### What changes were proposed in this pull request? Add Apache Spark 3.4.0 Dockerfiles. - Add 3.4.0 GPG key - Add .github/workflows/build_3.4.0.yaml - ./add-dockerfiles.sh 3.4.0 ### Why are the changes needed? Apache Spark 3.4.0 released: https://spark.apache.org/releases/spark-release-3-4-0.html ### Does this PR introduce _any_ user-facing change? Yes in future, new image will publised in future (after DOI reviewed) ### How was this patch tested? Add workflow and CI passed Closes #33 from Yikun/3.4.0. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- .github/workflows/build_3.4.0.yaml | 43 .github/workflows/publish.yml | 6 +- .github/workflows/test.yml | 6 +- 3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile | 86 .../entrypoint.sh | 114 + 3.4.0/scala2.12-java11-python3-ubuntu/Dockerfile | 83 +++ .../scala2.12-java11-python3-ubuntu/entrypoint.sh | 114 + 3.4.0/scala2.12-java11-r-ubuntu/Dockerfile | 82 +++ 3.4.0/scala2.12-java11-r-ubuntu/entrypoint.sh | 107 +++ 3.4.0/scala2.12-java11-ubuntu/Dockerfile | 79 ++ 3.4.0/scala2.12-java11-ubuntu/entrypoint.sh| 107 +++ tools/template.py | 2 + versions.json | 42 ++-- 13 files changed, 860 insertions(+), 11 deletions(-) diff --git a/.github/workflows/build_3.4.0.yaml b/.github/workflows/build_3.4.0.yaml new file mode 100644 index 000..8dd4e1e --- /dev/null +++ b/.github/workflows/build_3.4.0.yaml @@ -0,0 +1,43 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +name: "Build and Test (3.4.0)" + +on: + pull_request: +branches: + - 'master' +paths: + - '3.4.0/**' + - '.github/workflows/build_3.4.0.yaml' + - '.github/workflows/main.yml' + +jobs: + run-build: +strategy: + matrix: +image-type: ["all", "python", "scala", "r"] +name: Run +secrets: inherit +uses: ./.github/workflows/main.yml +with: + spark: 3.4.0 + scala: 2.12 + java: 11 + image-type: ${{ matrix.image-type }} diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml index 2941cfb..70b88b8 100644 --- a/.github/workflows/publish.yml +++ b/.github/workflows/publish.yml @@ -25,11 +25,13 @@ on: spark: description: 'The Spark version of Spark image.' required: true -default: '3.3.0' +default: '3.4.0' type: choice options: -- 3.3.0 +- 3.4.0 +- 3.3.2 - 3.3.1 +- 3.3.0 publish: description: 'Publish the image or not.' default: false diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index efb401b..06e2321 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -25,11 +25,13 @@ on: spark: description: 'The Spark version of Spark image.' required: true -default: '3.3.1' +default: '3.4.0' type: choice options: -- 3.3.0 +- 3.4.0 +- 3.3.2 - 3.3.1 +- 3.3.0 java: description: 'The Java version of Spark image.' default: 11 diff --git a/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile b/3.4.0/scala2.12-java11-python3-r-ubuntu/Dockerfile new file mode 100644 index 00
[spark-docker] branch master updated: [SPARK-42505] Apply entrypoint template change to 3.3.0/3.3.1
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 02bc905 [SPARK-42505] Apply entrypoint template change to 3.3.0/3.3.1 02bc905 is described below commit 02bc9054d757f8defbc2baf6af1d2a9aa84b2b35 Author: Yikun Jiang AuthorDate: Tue Feb 21 17:02:29 2023 +0800 [SPARK-42505] Apply entrypoint template change to 3.3.0/3.3.1 ### What changes were proposed in this pull request? Apply entrypoint template change to 3.3.0/3.3.1 ### Why are the changes needed? We remove the redundant PySpark related vars in https://github.com/apache/spark-docker/commit/e8f5b0a1151c349d9c7fdb09cf76300b42a6946b . This change also should be apply to 3.3.0/3.3.1. ### Does this PR introduce _any_ user-facing change? No, because the image hasn't plublished yet. ### How was this patch tested? CI for 3.3.0/3.3.1 passed Closes #31 from Yikun/SPARK-42505. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- 3.3.0/scala2.12-java11-r-ubuntu/entrypoint.sh | 7 --- 3.3.0/scala2.12-java11-ubuntu/entrypoint.sh | 7 --- 3.3.1/scala2.12-java11-r-ubuntu/entrypoint.sh | 7 --- 3.3.1/scala2.12-java11-ubuntu/entrypoint.sh | 7 --- 4 files changed, 28 deletions(-) diff --git a/3.3.0/scala2.12-java11-r-ubuntu/entrypoint.sh b/3.3.0/scala2.12-java11-r-ubuntu/entrypoint.sh index 4bb1557..159d539 100644 --- a/3.3.0/scala2.12-java11-r-ubuntu/entrypoint.sh +++ b/3.3.0/scala2.12-java11-r-ubuntu/entrypoint.sh @@ -45,13 +45,6 @@ if [ -n "$SPARK_EXTRA_CLASSPATH" ]; then SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_EXTRA_CLASSPATH" fi -if ! [ -z ${PYSPARK_PYTHON+x} ]; then -export PYSPARK_PYTHON -fi -if ! [ -z ${PYSPARK_DRIVER_PYTHON+x} ]; then -export PYSPARK_DRIVER_PYTHON -fi - # If HADOOP_HOME is set and SPARK_DIST_CLASSPATH is not set, set it here so Hadoop jars are available to the executor. # It does not set SPARK_DIST_CLASSPATH if already set, to avoid overriding customizations of this value from elsewhere e.g. Docker/K8s. if [ -n "${HADOOP_HOME}" ] && [ -z "${SPARK_DIST_CLASSPATH}" ]; then diff --git a/3.3.0/scala2.12-java11-ubuntu/entrypoint.sh b/3.3.0/scala2.12-java11-ubuntu/entrypoint.sh index 4bb1557..159d539 100644 --- a/3.3.0/scala2.12-java11-ubuntu/entrypoint.sh +++ b/3.3.0/scala2.12-java11-ubuntu/entrypoint.sh @@ -45,13 +45,6 @@ if [ -n "$SPARK_EXTRA_CLASSPATH" ]; then SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_EXTRA_CLASSPATH" fi -if ! [ -z ${PYSPARK_PYTHON+x} ]; then -export PYSPARK_PYTHON -fi -if ! [ -z ${PYSPARK_DRIVER_PYTHON+x} ]; then -export PYSPARK_DRIVER_PYTHON -fi - # If HADOOP_HOME is set and SPARK_DIST_CLASSPATH is not set, set it here so Hadoop jars are available to the executor. # It does not set SPARK_DIST_CLASSPATH if already set, to avoid overriding customizations of this value from elsewhere e.g. Docker/K8s. if [ -n "${HADOOP_HOME}" ] && [ -z "${SPARK_DIST_CLASSPATH}" ]; then diff --git a/3.3.1/scala2.12-java11-r-ubuntu/entrypoint.sh b/3.3.1/scala2.12-java11-r-ubuntu/entrypoint.sh index 4bb1557..159d539 100644 --- a/3.3.1/scala2.12-java11-r-ubuntu/entrypoint.sh +++ b/3.3.1/scala2.12-java11-r-ubuntu/entrypoint.sh @@ -45,13 +45,6 @@ if [ -n "$SPARK_EXTRA_CLASSPATH" ]; then SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_EXTRA_CLASSPATH" fi -if ! [ -z ${PYSPARK_PYTHON+x} ]; then -export PYSPARK_PYTHON -fi -if ! [ -z ${PYSPARK_DRIVER_PYTHON+x} ]; then -export PYSPARK_DRIVER_PYTHON -fi - # If HADOOP_HOME is set and SPARK_DIST_CLASSPATH is not set, set it here so Hadoop jars are available to the executor. # It does not set SPARK_DIST_CLASSPATH if already set, to avoid overriding customizations of this value from elsewhere e.g. Docker/K8s. if [ -n "${HADOOP_HOME}" ] && [ -z "${SPARK_DIST_CLASSPATH}" ]; then diff --git a/3.3.1/scala2.12-java11-ubuntu/entrypoint.sh b/3.3.1/scala2.12-java11-ubuntu/entrypoint.sh index 4bb1557..159d539 100644 --- a/3.3.1/scala2.12-java11-ubuntu/entrypoint.sh +++ b/3.3.1/scala2.12-java11-ubuntu/entrypoint.sh @@ -45,13 +45,6 @@ if [ -n "$SPARK_EXTRA_CLASSPATH" ]; then SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_EXTRA_CLASSPATH" fi -if ! [ -z ${PYSPARK_PYTHON+x} ]; then -export PYSPARK_PYTHON -fi -if ! [ -z ${PYSPARK_DRIVER_PYTHON+x} ]; then -export PYSPARK_DRIVER_PYTHON -fi - # If HADOOP_HOME is set and SPARK_DIST_CLASSPATH is not set, set it here so Hadoop jars are available to the executor. # It does not set SPARK_DIST_CLASSPATH if already set, to avoid overriding customizations of this value from elsewhere e.g. Docker/K8s. if [ -n "${HADOOP_HOME}" ] &
[spark-docker] branch master updated: [SPARK-42494] Add official image Dockerfile for Spark v3.3.2
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new e8f5b0a [SPARK-42494] Add official image Dockerfile for Spark v3.3.2 e8f5b0a is described below commit e8f5b0a1151c349d9c7fdb09cf76300b42a6946b Author: Yikun Jiang AuthorDate: Tue Feb 21 14:22:19 2023 +0800 [SPARK-42494] Add official image Dockerfile for Spark v3.3.2 ### What changes were proposed in this pull request? Add Apache Spark 3.3.2 Dockerfiles. - Add 3.3.2 GPG key - Add .github/workflows/build_3.3.2.yaml - ./add-dockerfiles.sh 3.3.2 ### Why are the changes needed? Apache Spark 3.3.2 released. https://lists.apache.org/thread/k8skf16wyn6rg9n0vd0t6l3bhw7c9svq ### Does this PR introduce _any_ user-facing change? Yes in future, new image will publised in future (after DOI reviewed) ### How was this patch tested? Add workflow and CI passed Closes #30 from Yikun/SPARK-42494. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- .github/workflows/build_3.3.2.yaml | 43 +++ 3.3.2/scala2.12-java11-python3-r-ubuntu/Dockerfile | 86 ++ .../entrypoint.sh | 0 3.3.2/scala2.12-java11-python3-ubuntu/Dockerfile | 83 + .../scala2.12-java11-python3-ubuntu/entrypoint.sh | 0 3.3.2/scala2.12-java11-r-ubuntu/Dockerfile | 82 + .../scala2.12-java11-r-ubuntu/entrypoint.sh| 7 -- 3.3.2/scala2.12-java11-ubuntu/Dockerfile | 79 .../scala2.12-java11-ubuntu/entrypoint.sh | 7 -- add-dockerfiles.sh | 2 +- entrypoint.sh.template | 2 + tools/template.py | 2 + 12 files changed, 378 insertions(+), 15 deletions(-) diff --git a/.github/workflows/build_3.3.2.yaml b/.github/workflows/build_3.3.2.yaml new file mode 100644 index 000..9ae1a13 --- /dev/null +++ b/.github/workflows/build_3.3.2.yaml @@ -0,0 +1,43 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +name: "Build and Test (3.3.2)" + +on: + pull_request: +branches: + - 'master' +paths: + - '3.3.2/**' + - '.github/workflows/build_3.3.2.yaml' + - '.github/workflows/main.yml' + +jobs: + run-build: +strategy: + matrix: +image-type: ["all", "python", "scala", "r"] +name: Run +secrets: inherit +uses: ./.github/workflows/main.yml +with: + spark: 3.3.2 + scala: 2.12 + java: 11 + image-type: ${{ matrix.image-type }} diff --git a/3.3.2/scala2.12-java11-python3-r-ubuntu/Dockerfile b/3.3.2/scala2.12-java11-python3-r-ubuntu/Dockerfile new file mode 100644 index 000..b518021 --- /dev/null +++ b/3.3.2/scala2.12-java11-python3-r-ubuntu/Dockerfile @@ -0,0 +1,86 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +FROM eclipse-temurin:11-jre-focal + +ARG spark_uid=185 + +RUN groupadd --system --gid=${spark_uid} spark && \ +useradd --system --uid=${spark_uid} --gid=spark spark + +RUN set -ex && \ +apt-get update && \ +ln -s /lib /lib64 && \ +apt insta
[spark] branch master updated: [SPARK-42214][INFRA] Enable infra image build for scheduled job
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f348d4fb9ff [SPARK-42214][INFRA] Enable infra image build for scheduled job f348d4fb9ff is described below commit f348d4fb9ffabc490b7c5294cd15eed2a74f2b60 Author: Yikun Jiang AuthorDate: Sat Jan 28 18:01:57 2023 +0800 [SPARK-42214][INFRA] Enable infra image build for scheduled job ### What changes were proposed in this pull request? Enable infra image build for scheduled job. The branch scheduled job is based on master branch workflow, so we need to enable the infra image for master branch / branch (3.4+). (except 3.2/3.3) ### Why are the changes needed? Enable infra image build for scheduled job. ### Does this PR introduce _any_ user-facing change? No, infra only ### How was this patch tested? - CI passed (to make sure master branch job passed) - Manually review and check the scheduled job after merge: https://github.com/apache/spark/actions/workflows/build_branch34.yml Closes #39778 from Yikun/SPARK-42214. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- .github/workflows/build_and_test.yml | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 54b3d1d19d4..021566a5b8e 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -58,8 +58,8 @@ jobs: required: ${{ steps.set-outputs.outputs.required }} image_url: >- ${{ - (inputs.branch == 'master' && steps.infra-image-outputs.outputs.image_url) - || 'dongjoon/apache-spark-github-action-image:20220207' + ((inputs.branch == 'branch-3.2' || inputs.branch == 'branch-3.3') && 'dongjoon/apache-spark-github-action-image:20220207') + || steps.infra-image-outputs.outputs.image_url }} steps: - name: Checkout Spark repository @@ -268,12 +268,12 @@ jobs: infra-image: name: "Base image build" needs: precondition -# Currently, only enable docker build from cache for `master` branch jobs +# Currently, enable docker build from cache for `master` and branch (since 3.4) jobs if: >- (fromJson(needs.precondition.outputs.required).pyspark == 'true' || fromJson(needs.precondition.outputs.required).lint == 'true' || fromJson(needs.precondition.outputs.required).sparkr == 'true') && - inputs.branch == 'master' + (inputs.branch != 'branch-3.2' && inputs.branch != 'branch-3.3') runs-on: ubuntu-latest permissions: packages: write - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-docker] branch master updated: [SPARK-40520] Add support to generate DOI mainifest
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 7bb8661 [SPARK-40520] Add support to generate DOI mainifest 7bb8661 is described below commit 7bb8661f7d57356f94fd5874696df1b1c058cb0b Author: Yikun Jiang AuthorDate: Wed Dec 21 10:15:44 2022 +0800 [SPARK-40520] Add support to generate DOI mainifest ### What changes were proposed in this pull request? This patch add support to generate DOI mainifest from versions.json. ### Why are the changes needed? To help generate DOI mainifest ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? ```shell $ flake8 ./tools/manifest.py --max-line-length=100 $ black ./tools/manifest.py All done! ✨ ✨ 1 file left unchanged. ``` ```shell $ tools/manifest.py manifest Maintainers: Apache Spark Developers (ApacheSpark) GitRepo: https://github.com/apache/spark-docker.git Tags: 3.3.1-scala2.12-java11-python3-ubuntu, 3.3.1-python3, 3.3.1, python3, latest Architectures: amd64, arm64v8 GitCommit: 496edb6dee0ade08bc5d180d7a6da0ff8b5d91ff Directory: ./3.3.1/scala2.12-java11-python3-ubuntu Tags: 3.3.1-scala2.12-java11-r-ubuntu, 3.3.1-r, r Architectures: amd64, arm64v8 GitCommit: 496edb6dee0ade08bc5d180d7a6da0ff8b5d91ff Directory: ./3.3.1/scala2.12-java11-r-ubuntu // ... ... ``` Closes #27 from Yikun/SPARK-40520. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- tools/manifest.py | 34 -- 1 file changed, 32 insertions(+), 2 deletions(-) diff --git a/tools/manifest.py b/tools/manifest.py index fbfad6f..13bc631 100755 --- a/tools/manifest.py +++ b/tools/manifest.py @@ -19,7 +19,33 @@ from argparse import ArgumentParser import json -from statistics import mode +import subprocess + + +def run_cmd(cmd): +if isinstance(cmd, list): +return subprocess.check_output(cmd).decode("utf-8") +else: +return subprocess.check_output(cmd.split(" ")).decode("utf-8") + + +def generate_manifest(versions): +output = ( +"Maintainers: Apache Spark Developers (@ApacheSpark)\n" +"GitRepo: https://github.com/apache/spark-docker.git\n\n; +) +git_commit = run_cmd("git rev-parse HEAD").replace("\n", "") +content = ( +"Tags: %s\n" +"Architectures: amd64, arm64v8\n" +"GitCommit: %s\n" +"Directory: ./%s\n\n" +) +for version in versions: +tags = ", ".join(version["tags"]) +path = version["path"] +output += content % (tags, git_commit, path) +return output def parse_opts(): @@ -27,7 +53,7 @@ def parse_opts(): parser.add_argument( dest="mode", -choices=["tags"], +choices=["tags", "manifest"], type=str, help="The print mode of script", ) @@ -76,6 +102,10 @@ def main(): # Get matched version's tags tags = versions[0]["tags"] if versions else [] print(",".join(["%s:%s" % (image, t) for t in tags])) +elif mode == "manifest": +with open(version_file, "r") as f: +versions = json.load(f).get("versions") +print(generate_manifest(versions)) if __name__ == "__main__": - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-40270][PS][FOLLOWUP][3.2] Skip test_style when pandas <1.3.0
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 43402fdeb09 [SPARK-40270][PS][FOLLOWUP][3.2] Skip test_style when pandas <1.3.0 43402fdeb09 is described below commit 43402fdeb0942e518ec7f5561ddf3690ae5cac27 Author: Yikun Jiang AuthorDate: Fri Dec 9 22:15:48 2022 +0800 [SPARK-40270][PS][FOLLOWUP][3.2] Skip test_style when pandas <1.3.0 ### What changes were proposed in this pull request? According to https://pandas.pydata.org/docs/reference/api/pandas.io.formats.style.Styler.to_latex.html: `pandas.io.formats.style.Styler.to_latex` introduced since 1.3.0, so before panda 1.3.0, should skip the check ``` ERROR [0.180s]: test_style (pyspark.pandas.tests.test_dataframe.DataFrameTest) -- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/pandas/tests/test_dataframe.py", line 5795, in test_style check_style() File "/__w/spark/spark/python/pyspark/pandas/tests/test_dataframe.py", line 5793, in check_style self.assert_eq(pdf_style.to_latex(), psdf_style.to_latex()) AttributeError: 'Styler' object has no attribute 'to_latex' ``` Related: https://github.com/apache/spark/commit/58375a86e6ff49c5bcee49939fbd98eb848ae59f ### Why are the changes needed? This test break the 3.2 branch pyspark test (with python 3.6 + pandas 1.1.x), so I think better add the `skipIf` it. See also https://github.com/apache/spark/pull/38982#issuecomment-1343923114 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Closes #39008 from Yikun/branch-3.2-style-check. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- python/pyspark/pandas/tests/test_dataframe.py | 4 1 file changed, 4 insertions(+) diff --git a/python/pyspark/pandas/tests/test_dataframe.py b/python/pyspark/pandas/tests/test_dataframe.py index b4187d59ae7..15cadbebdb6 100644 --- a/python/pyspark/pandas/tests/test_dataframe.py +++ b/python/pyspark/pandas/tests/test_dataframe.py @@ -5774,6 +5774,10 @@ class DataFrameTest(PandasOnSparkTestCase, SQLTestUtils): for value_psdf, value_pdf in zip(psdf, pdf): self.assert_eq(value_psdf, value_pdf) +@unittest.skipIf( +LooseVersion(pd.__version__) < LooseVersion("1.3.0"), +"pandas support `Styler.to_latex` since 1.3.0", +) def test_style(self): # Currently, the `style` function returns a pandas object `Styler` as it is, # processing only the number of rows declared in `compute.max_rows`. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.3 updated: [SPARK-40270][PS][FOLLOWUP][3.3] Skip test_style when pandas <1.3.0
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new b6c6526e3b1 [SPARK-40270][PS][FOLLOWUP][3.3] Skip test_style when pandas <1.3.0 b6c6526e3b1 is described below commit b6c6526e3b1c5bd32b010a38cb0f4faeba678e22 Author: Yikun Jiang AuthorDate: Fri Dec 9 22:13:09 2022 +0800 [SPARK-40270][PS][FOLLOWUP][3.3] Skip test_style when pandas <1.3.0 ### What changes were proposed in this pull request? According to https://pandas.pydata.org/docs/reference/api/pandas.io.formats.style.Styler.to_latex.html: `pandas.io.formats.style.Styler.to_latex` introduced since 1.3.0, so before panda 1.3.0, should skip the check ``` ERROR [0.180s]: test_style (pyspark.pandas.tests.test_dataframe.DataFrameTest) -- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/pandas/tests/test_dataframe.py", line 5795, in test_style check_style() File "/__w/spark/spark/python/pyspark/pandas/tests/test_dataframe.py", line 5793, in check_style self.assert_eq(pdf_style.to_latex(), psdf_style.to_latex()) AttributeError: 'Styler' object has no attribute 'to_latex' ``` Related: https://github.com/apache/spark/commit/58375a86e6ff49c5bcee49939fbd98eb848ae59f ### Why are the changes needed? This test break the 3.2 branch pyspark test (with python 3.6 + pandas 1.1.x), so I think better add the `skipIf` it. See also https://github.com/apache/spark/pull/38982#issuecomment-1343923114 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - CI passed Closes #39007 from Yikun/branch-3.3-check. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- python/pyspark/pandas/tests/test_dataframe.py | 4 1 file changed, 4 insertions(+) diff --git a/python/pyspark/pandas/tests/test_dataframe.py b/python/pyspark/pandas/tests/test_dataframe.py index 0a7eda77564..0c23bf07a69 100644 --- a/python/pyspark/pandas/tests/test_dataframe.py +++ b/python/pyspark/pandas/tests/test_dataframe.py @@ -6375,6 +6375,10 @@ class DataFrameTest(ComparisonTestBase, SQLTestUtils): psdf = ps.from_pandas(pdf) self.assert_eq(pdf.cov(), psdf.cov()) +@unittest.skipIf( +LooseVersion(pd.__version__) < LooseVersion("1.3.0"), +"pandas support `Styler.to_latex` since 1.3.0", +) def test_style(self): # Currently, the `style` function returns a pandas object `Styler` as it is, # processing only the number of rows declared in `compute.max_rows`. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-40270][PS][FOLLOWUP] Skip test_style when pandas <1.3.0
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new dd0bd0762b3 [SPARK-40270][PS][FOLLOWUP] Skip test_style when pandas <1.3.0 dd0bd0762b3 is described below commit dd0bd0762b344ab34e1b08c9bbd2ac77b83856e0 Author: Yikun Jiang AuthorDate: Fri Dec 9 22:11:03 2022 +0800 [SPARK-40270][PS][FOLLOWUP] Skip test_style when pandas <1.3.0 ### What changes were proposed in this pull request? According to https://pandas.pydata.org/docs/reference/api/pandas.io.formats.style.Styler.to_latex.html: `pandas.io.formats.style.Styler.to_latex` introduced since 1.3.0, so before panda 1.3.0, should skip the check ``` ERROR [0.180s]: test_style (pyspark.pandas.tests.test_dataframe.DataFrameTest) -- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/pandas/tests/test_dataframe.py", line 5795, in test_style check_style() File "/__w/spark/spark/python/pyspark/pandas/tests/test_dataframe.py", line 5793, in check_style self.assert_eq(pdf_style.to_latex(), psdf_style.to_latex()) AttributeError: 'Styler' object has no attribute 'to_latex' ``` Related: https://github.com/apache/spark/commit/58375a86e6ff49c5bcee49939fbd98eb848ae59f ### Why are the changes needed? This test break the 3.2 branch pyspark test (with python 3.6 + pandas 1.1.x), so I think better add the `skipIf` it. See also https://github.com/apache/spark/pull/38982#issuecomment-1343923114 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - CI passed - Test on 3.2 branch: https://github.com/Yikun/spark/pull/194, https://github.com/Yikun/spark/actions/runs/3655564439/jobs/6177030747 Closes #39002 from Yikun/skip-check. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- python/pyspark/pandas/tests/test_dataframe.py | 4 1 file changed, 4 insertions(+) diff --git a/python/pyspark/pandas/tests/test_dataframe.py b/python/pyspark/pandas/tests/test_dataframe.py index 4e80c680b6e..ded110c1231 100644 --- a/python/pyspark/pandas/tests/test_dataframe.py +++ b/python/pyspark/pandas/tests/test_dataframe.py @@ -7074,6 +7074,10 @@ class DataFrameTest(ComparisonTestBase, SQLTestUtils): psdf = ps.from_pandas(pdf) self.assert_eq(pdf.cov(), psdf.cov()) +@unittest.skipIf( +LooseVersion(pd.__version__) < LooseVersion("1.3.0"), +"pandas support `Styler.to_latex` since 1.3.0", +) def test_style(self): # Currently, the `style` function returns a pandas object `Styler` as it is, # processing only the number of rows declared in `compute.max_rows`. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/01: Update test_dataframe.py
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch branch-3.2-style-check in repository https://gitbox.apache.org/repos/asf/spark.git commit 49d31b0d860da90cf2f4ec696b3220f24355f65e Author: Yikun Jiang AuthorDate: Fri Dec 9 19:46:01 2022 +0800 Update test_dataframe.py --- python/pyspark/pandas/tests/test_dataframe.py | 4 1 file changed, 4 insertions(+) diff --git a/python/pyspark/pandas/tests/test_dataframe.py b/python/pyspark/pandas/tests/test_dataframe.py index b4187d59ae7..15cadbebdb6 100644 --- a/python/pyspark/pandas/tests/test_dataframe.py +++ b/python/pyspark/pandas/tests/test_dataframe.py @@ -5774,6 +5774,10 @@ class DataFrameTest(PandasOnSparkTestCase, SQLTestUtils): for value_psdf, value_pdf in zip(psdf, pdf): self.assert_eq(value_psdf, value_pdf) +@unittest.skipIf( +LooseVersion(pd.__version__) < LooseVersion("1.3.0"), +"pandas support `Styler.to_latex` since 1.3.0", +) def test_style(self): # Currently, the `style` function returns a pandas object `Styler` as it is, # processing only the number of rows declared in `compute.max_rows`. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2-style-check created (now 49d31b0d860)
This is an automated email from the ASF dual-hosted git repository. yikun pushed a change to branch branch-3.2-style-check in repository https://gitbox.apache.org/repos/asf/spark.git at 49d31b0d860 Update test_dataframe.py This branch includes the following new commits: new 49d31b0d860 Update test_dataframe.py The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.3 updated: [SPARK-41253][K8S][TESTS] Make Spark K8S volcano IT work in Github Action
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 821997bec37 [SPARK-41253][K8S][TESTS] Make Spark K8S volcano IT work in Github Action 821997bec37 is described below commit 821997bec3703ec52db9b1deb667e11e76296c48 Author: Yikun Jiang AuthorDate: Fri Dec 2 22:44:50 2022 -0800 [SPARK-41253][K8S][TESTS] Make Spark K8S volcano IT work in Github Action ### What changes were proposed in this pull request? This patch makes Spark K8s volcano IT can be ran in Github Action resource limited env. It will help downstream community like volcano to enable spark IT test in github action. BTW, there is no plan to enable volcano test in Spark community, this patch only make test work but **DO NOT** enable the volcano test in Apache Spark GA, it will help downstream test. - Change parallel job number from 4 to 2. (Only 1 job in each queue) if in github action env. - Get specified `spark.kubernetes.[driver|executor].request.cores` - Set queue limit according specified [driver|executor].request.cores just like we done in normal test: https://github.com/apache/spark/commit/883a481e44a1f91ef3fc3aea2838a598cbd6cf0f ### Why are the changes needed? It helps downstream communitys who want to use free github action hosted resources to enable spark IT test in github action. ### Does this PR introduce _any_ user-facing change? No, test only. ### How was this patch tested? - Test on my local env with enough resource (default): ``` $ build/sbt -Pvolcano -Pkubernetes -Pkubernetes-integration-tests -Dtest.include.tags=volcano "kubernetes-integration-tests/test" [info] KubernetesSuite: [info] VolcanoSuite: [info] - Run SparkPi with volcano scheduler (10 seconds, 410 milliseconds) [info] - SPARK-38187: Run SparkPi Jobs with minCPU (25 seconds, 489 milliseconds) [info] - SPARK-38187: Run SparkPi Jobs with minMemory (25 seconds, 518 milliseconds) [info] - SPARK-38188: Run SparkPi jobs with 2 queues (only 1 enabled) (14 seconds, 349 milliseconds) [info] - SPARK-38188: Run SparkPi jobs with 2 queues (all enabled) (23 seconds, 516 milliseconds) [info] - SPARK-38423: Run driver job to validate priority order (16 seconds, 404 milliseconds) [info] YuniKornSuite: [info] Run completed in 2 minutes, 34 seconds. [info] Total number of tests run: 6 [info] Suites: completed 3, aborted 0 [info] Tests: succeeded 6, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 439 s (07:19), completed 2022-12-3 8:58:50 ``` - Test on Github Action with `volcanoMaxConcurrencyJobNum`: https://github.com/Yikun/spark/pull/192 ``` $ build/sbt -Pvolcano -Psparkr -Pkubernetes -Pkubernetes-integration-tests -Dspark.kubernetes.test.driverRequestCores=0.5 -Dspark.kubernetes.test.executorRequestCores=0.2 -Dspark.kubernetes.test.volcanoMaxConcurrencyJobNum=1 -Dtest.include.tags=volcano "kubernetes-integration-tests/test" [info] VolcanoSuite: [info] - Run SparkPi with volcano scheduler (18 seconds, 122 milliseconds) [info] - SPARK-38187: Run SparkPi Jobs with minCPU (53 seconds, 964 milliseconds) [info] - SPARK-38187: Run SparkPi Jobs with minMemory (54 seconds, 523 milliseconds) [info] - SPARK-38188: Run SparkPi jobs with 2 queues (only 1 enabled) (22 seconds, 185 milliseconds) [info] - SPARK-38188: Run SparkPi jobs with 2 queues (all enabled) (33 seconds, 349 milliseconds) [info] - SPARK-38423: Run driver job to validate priority order (32 seconds, 435 milliseconds) [info] YuniKornSuite: [info] Run completed in 4 minutes, 16 seconds. [info] Total number of tests run: 6 [info] Suites: completed 3, aborted 0 [info] Tests: succeeded 6, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [warn] In the last 494 seconds, 7.296 (1.5%) were spent in GC. [Heap: 3.12GB free of 3.83GB, max 3.83GB] Consider increasing the JVM heap using `-Xmx` or try a different collector, e.g. `-XX:+UseG1GC`, for better performance. [success] Total time: 924 s (15:24), completed Dec 3, 2022 12:49:42 AM ``` - CI passed Closes #38789 from Yikun/SPARK-41253. Authored-by: Yikun Jiang Signed-off-by: Dongjoon Hyun (cherry picked from commit 72d58d5f8a847bac53cf01b137780c7e4e2664d7) Signed-off-by: Yikun Jiang --- .../kubernetes/integration-tests/README.md | 8 .../volcano/driver-podgroup-template-cpu-2u.yml| 23 -- .../deploy/k8s/integrationtest/TestConstants.scala | 2 + .../k8s/integrationtest/VolcanoTestsSuite.scala| 52 +- 4 files changed, 51 insertions(+), 3
[spark] branch branch-3.3 updated: [SPARK-38921][K8S][TESTS] Use k8s-client to create queue resource in Volcano IT
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 20cc2b6104e [SPARK-38921][K8S][TESTS] Use k8s-client to create queue resource in Volcano IT 20cc2b6104e is described below commit 20cc2b6104e1670be3295ed52be54bb40de1b1ce Author: Yikun Jiang AuthorDate: Thu Aug 11 08:28:57 2022 -0700 [SPARK-38921][K8S][TESTS] Use k8s-client to create queue resource in Volcano IT ### What changes were proposed in this pull request? Use fabric8io/k8s-client to create queue resource in Volcano IT. ### Why are the changes needed? Use k8s-client to create volcano queue to - Make code easy to understand - Enable abity to set queue capacity dynamically. This will help to support running Volcano test in a resource limited env (such as github action). ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Volcano IT passed Closes #36219 from Yikun/SPARK-38921. Authored-by: Yikun Jiang Signed-off-by: Dongjoon Hyun (cherry picked from commit a49f66fe49d4d4bbfb41da2e5bbb5af4bd64d1da) Signed-off-by: Yikun Jiang --- .../src/test/resources/volcano/disable-queue.yml | 24 --- .../volcano/disable-queue0-enable-queue1.yml | 31 - .../volcano/driver-podgroup-template-cpu-2u.yml| 2 +- .../volcano/driver-podgroup-template-memory-3g.yml | 2 +- .../src/test/resources/volcano/enable-queue.yml| 24 --- .../volcano/enable-queue0-enable-queue1.yml| 29 - .../src/test/resources/volcano/queue-2u-3g.yml | 25 .../k8s/integrationtest/VolcanoTestsSuite.scala| 74 +++--- 8 files changed, 52 insertions(+), 159 deletions(-) diff --git a/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/disable-queue.yml b/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/disable-queue.yml deleted file mode 100644 index d9f8c36471e..000 --- a/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/disable-queue.yml +++ /dev/null @@ -1,24 +0,0 @@ -# -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -#http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# -apiVersion: scheduling.volcano.sh/v1beta1 -kind: Queue -metadata: - name: queue -spec: - weight: 1 - capability: -cpu: "0.001" diff --git a/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/disable-queue0-enable-queue1.yml b/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/disable-queue0-enable-queue1.yml deleted file mode 100644 index 82e479478cc..000 --- a/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/disable-queue0-enable-queue1.yml +++ /dev/null @@ -1,31 +0,0 @@ -# -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -#http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# -apiVersion: scheduling.volcano.sh/v1beta1 -kind: Queue -metadata: - name: queue0 -spec: - weight: 1 - capability: -cpu: "0.001" -apiVersion: scheduling.volcano.sh/v1beta1 -kind: Queue -metadata: - name: queue1 -spec: - weight: 1 diff --git a/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/driver-podgroup-template-cpu-2u.yml b/resource-managers/kubernetes/integration-tests/src/test/resources/volcano/driver
[spark-docker] branch master updated: [SPARK-41287][INFRA] Add test workflow to help self-build image test in fork repo
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new cfcbeac [SPARK-41287][INFRA] Add test workflow to help self-build image test in fork repo cfcbeac is described below commit cfcbeac5d2b922a5ee7dfd2b4a5cf08072c827b7 Author: Yikun Jiang AuthorDate: Mon Nov 28 17:55:18 2022 +0800 [SPARK-41287][INFRA] Add test workflow to help self-build image test in fork repo ### What changes were proposed in this pull request? This patch adds a test workflow to help fork repo to test image in their fork repos. ![image](https://user-images.githubusercontent.com/1736354/204183109-e2341397-251e-42a0-b5f7-c1c1f9334ff9.png) such like: - https://github.com/Yikun/spark-docker/actions/runs/3552072792/jobs/5966742869 - https://github.com/Yikun/spark-docker/actions/runs/3561513498/jobs/5982485960 ### Why are the changes needed? Help devs/users test their own image in their fork repo ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Test in my fork repo: https://github.com/Yikun/spark-docker/actions/workflows/test.yml Closes #26 from Yikun/test-workflow. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- .github/workflows/main.yml | 28 +++-- .github/workflows/publish.yml | 2 +- .github/workflows/{publish.yml => test.yml} | 62 - 3 files changed, 60 insertions(+), 32 deletions(-) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index ebafcdc..fd37990 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -37,13 +37,18 @@ on: required: true type: string default: 11 + build: +description: Build the image or not. +required: false +type: boolean +default: true publish: description: Publish the image or not. required: false type: boolean default: false repository: -description: The registry to be published (Avaliable only when publish is selected). +description: The registry to be published/tested. (Available only in publish/test workflow) required: false type: string default: ghcr.io/apache/spark-docker @@ -52,6 +57,11 @@ on: required: false type: string default: python + image-tag: +type: string +description: The image tag to be tested. (Available only in test workflow) +required: false +default: latest jobs: main: @@ -83,11 +93,18 @@ jobs: esac TAG=scala${{ inputs.scala }}-java${{ inputs.java }}-$SUFFIX - REPO_OWNER=$(echo "${{ github.repository_owner }}" | tr '[:upper:]' '[:lower:]') - TEST_REPO=localhost:5000/$REPO_OWNER/spark-docker IMAGE_NAME=spark IMAGE_PATH=${{ inputs.spark }}/$TAG - UNIQUE_IMAGE_TAG=${{ inputs.spark }}-$TAG + if [ "${{ inputs.build }}" == "true" ]; then +# Use the local registry to build and test +REPO_OWNER=$(echo "${{ github.repository_owner }}" | tr '[:upper:]' '[:lower:]') +TEST_REPO=localhost:5000/$REPO_OWNER/spark-docker +UNIQUE_IMAGE_TAG=${{ inputs.spark }}-$TAG + else +# Use specified {repository}/spark:{image-tag} image to test +TEST_REPO=${{ inputs.repository }} +UNIQUE_IMAGE_TAG=${{ inputs.image-tag }} + fi IMAGE_URL=$TEST_REPO/$IMAGE_NAME:$UNIQUE_IMAGE_TAG PUBLISH_REPO=${{ inputs.repository }} @@ -119,15 +136,18 @@ jobs: echo "PUBLISH_IMAGE_URL:"${PUBLISH_IMAGE_URL} - name: Build - Set up QEMU +if: ${{ inputs.build }} uses: docker/setup-qemu-action@v2 - name: Build - Set up Docker Buildx +if: ${{ inputs.build }} uses: docker/setup-buildx-action@v2 with: # This required by local registry driver-opts: network=host - name: Build - Build and push test image +if: ${{ inputs.build }} uses: docker/build-push-action@v3 with: context: ${{ env.IMAGE_PATH }} diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml index 4a07f5d..2941cfb 100644 --- a/.github/workflows/publish.yml +++ b/.github/workflows/publish.yml @@ -36,7 +36,7 @@ on: type: boolean required: true repository: -description: The registry to be published (Avaliable only when publish is true). +description: The registry to be published (Available only when publish is true). required
[spark-docker] branch master updated: [SPARK-41269][INFRA] Move image matrix into version's workflow
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new d58e178 [SPARK-41269][INFRA] Move image matrix into version's workflow d58e178 is described below commit d58e17890f07b4c8c8d212775a53c48dc3a6ce42 Author: Yikun Jiang AuthorDate: Mon Nov 28 09:36:54 2022 +0800 [SPARK-41269][INFRA] Move image matrix into version's workflow ### What changes were proposed in this pull request? This patch refactors main workflow: - Move image matrix into version's workflow to make the main workflow more clear. And also will help downstream repo to only validate specified image type. - Move build steps into a same section ### Why are the changes needed? This will help downstream repo to only validate specified image type. After this patch, we will add a test to reuse spark docker workflow like: https://github.com/yikun/spark-docker/commit/45044cee2e8919de7e7353e74f8ca612ad16629a to help developers/users test their self build image. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Closes #25 from Yikun/matrix-refactor. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- .github/workflows/build_3.3.0.yaml | 4 ++ .github/workflows/build_3.3.1.yaml | 4 ++ .github/workflows/main.yml | 76 -- .github/workflows/publish.yml | 2 + 4 files changed, 51 insertions(+), 35 deletions(-) diff --git a/.github/workflows/build_3.3.0.yaml b/.github/workflows/build_3.3.0.yaml index 7e7ce39..a4f8224 100644 --- a/.github/workflows/build_3.3.0.yaml +++ b/.github/workflows/build_3.3.0.yaml @@ -30,6 +30,9 @@ on: jobs: run-build: +strategy: + matrix: +image-type: ["all", "python", "scala", "r"] name: Run secrets: inherit uses: ./.github/workflows/main.yml @@ -37,3 +40,4 @@ jobs: spark: 3.3.0 scala: 2.12 java: 11 + image-type: ${{ matrix.image-type }} diff --git a/.github/workflows/build_3.3.1.yaml b/.github/workflows/build_3.3.1.yaml index f6a4b7d..9e5c082 100644 --- a/.github/workflows/build_3.3.1.yaml +++ b/.github/workflows/build_3.3.1.yaml @@ -30,6 +30,9 @@ on: jobs: run-build: +strategy: + matrix: +image-type: ["all", "python", "scala", "r"] name: Run secrets: inherit uses: ./.github/workflows/main.yml @@ -37,3 +40,4 @@ jobs: spark: 3.3.1 scala: 2.12 java: 11 + image-type: ${{ matrix.image-type }} diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 024b853..ebafcdc 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -47,6 +47,11 @@ on: required: false type: string default: ghcr.io/apache/spark-docker + image-type: +description: The image type of the image (all, python, scala, r). +required: false +type: string +default: python jobs: main: @@ -60,41 +65,33 @@ jobs: image: registry:2 ports: - 5000:5000 -strategy: - matrix: -spark_version: - - ${{ inputs.spark }} -scala_version: - - ${{ inputs.scala }} -java_version: - - ${{ inputs.java }} -image_suffix: [python3-ubuntu, ubuntu, r-ubuntu, python3-r-ubuntu] steps: - name: Checkout Spark Docker repository uses: actions/checkout@v3 - - name: Set up QEMU -uses: docker/setup-qemu-action@v2 - - - name: Set up Docker Buildx -uses: docker/setup-buildx-action@v2 -with: - # This required by local registry - driver-opts: network=host - - - name: Generate tags + - name: Prepare - Generate tags run: | - TAG=scala${{ matrix.scala_version }}-java${{ matrix.java_version }}-${{ matrix.image_suffix }} + case "${{ inputs.image-type }}" in + all) SUFFIX=python3-r-ubuntu + ;; + python) SUFFIX=python3-ubuntu + ;; + r) SUFFIX=r-ubuntu + ;; + scala) SUFFIX=ubuntu + ;; + esac + TAG=scala${{ inputs.scala }}-java${{ inputs.java }}-$SUFFIX REPO_OWNER=$(echo "${{ github.repository_owner }}" | tr '[:upper:]' '[:lower:]') TEST_REPO=localhost:5000/$REPO_OWNER/spark-docker IMAGE_NAME=spark - IMAGE_PATH=${{ matrix.spark_version }}/$TAG - UNIQUE_IMAGE_TAG=${{ matrix.spark_version }}-$TAG + IMAGE_PATH=${{ inputs.spark }}/$TAG + UNIQUE_IMAGE_TAG=${{ inputs.s
[spark-docker] branch master updated: [SPARK-41258][INFRA] Upgrade docker and actions to cleanup warnning
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 33abc18 [SPARK-41258][INFRA] Upgrade docker and actions to cleanup warnning 33abc18 is described below commit 33abc1894f3de135e827ce393842ca355229c117 Author: Yikun Jiang AuthorDate: Fri Nov 25 14:57:27 2022 +0800 [SPARK-41258][INFRA] Upgrade docker and actions to cleanup warnning ### What changes were proposed in this pull request? - Upgrade `actions/checkout` from v2 to v3 - Upgrade `docker/build-push-action` from v2 to v3 ### Why are the changes needed? Cleanup set output and lower version node warnning ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Test passed Closes #24 from Yikun/upgrade-actions. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- .github/workflows/main.yml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index dfb99e9..024b853 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -71,7 +71,7 @@ jobs: image_suffix: [python3-ubuntu, ubuntu, r-ubuntu, python3-r-ubuntu] steps: - name: Checkout Spark Docker repository -uses: actions/checkout@v2 +uses: actions/checkout@v3 - name: Set up QEMU uses: docker/setup-qemu-action@v2 @@ -122,7 +122,7 @@ jobs: echo "PUBLISH_IMAGE_URL:"${PUBLISH_IMAGE_URL} - name: Build and push test image -uses: docker/build-push-action@v2 +uses: docker/build-push-action@v3 with: context: ${{ env.IMAGE_PATH }} tags: ${{ env.IMAGE_URL }} @@ -258,7 +258,7 @@ jobs: - name: Publish - Push Image if: ${{ inputs.publish }} -uses: docker/build-push-action@v2 +uses: docker/build-push-action@v3 with: context: ${{ env.IMAGE_PATH }} push: true - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a205e97ad9a -> 575b8f00faf)
This is an automated email from the ASF dual-hosted git repository. yikun pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from a205e97ad9a [SPARK-41230][CONNECT][PYTHON] Remove `str` from Aggregate expression type add 575b8f00faf [SPARK-41257][INFRA] Upgrade actions/labeler to v4 No new revisions were added by this update. Summary of changes: .github/workflows/labeler.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (033dbe604bc -> 71b5c5bde75)
This is an automated email from the ASF dual-hosted git repository. yikun pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 033dbe604bc [SPARK-41247][BUILD] Unify the Protobuf versions in Spark connect and Protobuf connector add 71b5c5bde75 [SPARK-41251][PS][INFRA] Upgrade pandas from 1.5.1 to 1.5.2 No new revisions were added by this update. Summary of changes: dev/infra/Dockerfile | 4 ++-- python/pyspark/pandas/supported_api_gen.py | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (246479c8c5c -> 6e6e8560557)
This is an automated email from the ASF dual-hosted git repository. yikun pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 246479c8c5c [SPARK-41196][CONNECT][FOLLOW-UP] Fix out of sync generated files for Python add 6e6e8560557 [SPARK-41186][INFRA][PS][TESTS] Upgrade infra and replace `list_run_infos` with `search_runs` in mlflow doctest No new revisions were added by this update. Summary of changes: dev/infra/Dockerfile| 12 +--- python/pyspark/pandas/mlflow.py | 4 ++-- 2 files changed, 7 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-41107][PYTHON][INFRA][TESTS] Install memory-profiler in the CI
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 12a77bb22f1 [SPARK-41107][PYTHON][INFRA][TESTS] Install memory-profiler in the CI 12a77bb22f1 is described below commit 12a77bb22f1689e361a5efe2d7000aead74ebc43 Author: Xinrong Meng AuthorDate: Fri Nov 18 17:12:39 2022 +0800 [SPARK-41107][PYTHON][INFRA][TESTS] Install memory-profiler in the CI ### What changes were proposed in this pull request? Install [memory-profiler](https://pypi.org/project/memory-profiler/) in the CI in order to enable memory profiling tests. ### Why are the changes needed? That's a sub-task of [SPARK-40281](https://issues.apache.org/jira/browse/SPARK-40281) Memory Profiler on Executors. PySpark memory profiler depends on memory-profiler. The PR proposes to install memory-profiler in the CI to enable related tests. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. Closes #38611 from xinrong-meng/ci_mp. Lead-authored-by: Xinrong Meng Co-authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- dev/infra/Dockerfile | 3 +++ python/pyspark/tests/test_memory_profiler.py | 8 +--- python/pyspark/tests/test_profiler.py| 2 ++ 3 files changed, 6 insertions(+), 7 deletions(-) diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile index 96b20894b87..a6331c2ead4 100644 --- a/dev/infra/Dockerfile +++ b/dev/infra/Dockerfile @@ -68,3 +68,6 @@ ENV R_LIBS_SITE "/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library # Add Python deps for Spark Connect. RUN python3.9 -m pip install grpcio protobuf + +# SPARK-41186: Move memory-profiler to pyspark deps install when mlfow doctest test fix +RUN python3.9 -m pip install 'memory-profiler==0.60.0' diff --git a/python/pyspark/tests/test_memory_profiler.py b/python/pyspark/tests/test_memory_profiler.py index 7da82dccb37..3dc8ce4ce22 100644 --- a/python/pyspark/tests/test_memory_profiler.py +++ b/python/pyspark/tests/test_memory_profiler.py @@ -27,17 +27,11 @@ from unittest import mock import pandas as pd from pyspark import SparkConf, SparkContext +from pyspark.profiler import has_memory_profiler from pyspark.sql import SparkSession from pyspark.sql.functions import pandas_udf, udf from pyspark.testing.utils import PySparkTestCase -try: -import memory_profiler # type: ignore[import] # noqa: F401 - -has_memory_profiler = True -except Exception: -has_memory_profiler = False - @unittest.skipIf(not has_memory_profiler, "Must have memory-profiler installed.") class MemoryProfilerTests(PySparkTestCase): diff --git a/python/pyspark/tests/test_profiler.py b/python/pyspark/tests/test_profiler.py index ceae904ca6f..8a078d36b46 100644 --- a/python/pyspark/tests/test_profiler.py +++ b/python/pyspark/tests/test_profiler.py @@ -22,6 +22,7 @@ import unittest from io import StringIO from pyspark import SparkConf, SparkContext, BasicProfiler +from pyspark.profiler import has_memory_profiler from pyspark.sql import SparkSession from pyspark.sql.functions import udf from pyspark.sql.utils import PythonException @@ -126,6 +127,7 @@ class ProfilerTests2(unittest.TestCase): finally: sc.stop() +@unittest.skipIf(has_memory_profiler, "Test when memory-profiler is not installed.") def test_no_memory_profile_installed(self): sc = SparkContext( conf=SparkConf() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-docker] branch master updated: [SPARK-40519] Add "Publish" workflow to help release apache/spark image
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new f488d73 [SPARK-40519] Add "Publish" workflow to help release apache/spark image f488d73 is described below commit f488d732d254caa78c1e1a2ef74958e6c867dad6 Author: Yikun Jiang AuthorDate: Tue Nov 15 21:32:30 2022 +0800 [SPARK-40519] Add "Publish" workflow to help release apache/spark image ### What changes were proposed in this pull request? The publish step will include 3 steps: 1. First build the local image. 2. Pass related test (K8s test / Standalone test) using image of first step. 3. After pass all test, will publish to `ghcr` (This might help RC test) or `dockerhub` It's about 30-40 mins to publish all images. Add "Publish" workflow to help release apache/spark image. ![image](https://user-images.githubusercontent.com/1736354/201015477-30428444-0ed5-4436-8b59-7420c678c4a6.png) ### Why are the changes needed? One click to create the `apche/spark` image. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? 1. Set default branch in my fork repo 2. Run workflow manually, https://github.com/Yikun/spark-docker/actions/workflows/publish.yml?query=is%3Asuccess Closes #23 from Yikun/workflow. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- .github/workflows/main.yml| 43 +++ .github/workflows/publish.yml | 66 ++ tools/manifest.py | 82 +++ versions.json | 64 + 4 files changed, 255 insertions(+) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index accf8ae..dfb99e9 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -37,6 +37,16 @@ on: required: true type: string default: 11 + publish: +description: Publish the image or not. +required: false +type: boolean +default: false + repository: +description: The registry to be published (Avaliable only when publish is selected). +required: false +type: string +default: ghcr.io/apache/spark-docker jobs: main: @@ -83,6 +93,9 @@ jobs: UNIQUE_IMAGE_TAG=${{ matrix.spark_version }}-$TAG IMAGE_URL=$TEST_REPO/$IMAGE_NAME:$UNIQUE_IMAGE_TAG + PUBLISH_REPO=${{ inputs.repository }} + PUBLISH_IMAGE_URL=`tools/manifest.py tags -i ${PUBLISH_REPO}/${IMAGE_NAME} -p ${{ matrix.spark_version }}/${TAG}` + # Unique image tag in each version: 3.3.0-scala2.12-java11-python3-ubuntu echo "UNIQUE_IMAGE_TAG=${UNIQUE_IMAGE_TAG}" >> $GITHUB_ENV # Test repo: ghcr.io/apache/spark-docker @@ -94,6 +107,9 @@ jobs: # Image URL: ghcr.io/apache/spark-docker/spark:3.3.0-scala2.12-java11-python3-ubuntu echo "IMAGE_URL=${IMAGE_URL}" >> $GITHUB_ENV + echo "PUBLISH_REPO=${PUBLISH_REPO}" >> $GITHUB_ENV + echo "PUBLISH_IMAGE_URL=${PUBLISH_IMAGE_URL}" >> $GITHUB_ENV + - name: Print Image tags run: | echo "UNIQUE_IMAGE_TAG: "${UNIQUE_IMAGE_TAG} @@ -102,6 +118,9 @@ jobs: echo "IMAGE_PATH: "${IMAGE_PATH} echo "IMAGE_URL: "${IMAGE_URL} + echo "PUBLISH_REPO:"${PUBLISH_REPO} + echo "PUBLISH_IMAGE_URL:"${PUBLISH_IMAGE_URL} + - name: Build and push test image uses: docker/build-push-action@v2 with: @@ -221,3 +240,27 @@ jobs: with: name: spark-on-kubernetes-it-log path: "**/target/integration-tests.log" + + - name: Publish - Login to GitHub Container Registry +if: ${{ inputs.publish }} +uses: docker/login-action@v2 +with: + registry: ghcr.io + username: ${{ github.actor }} + password: ${{ secrets.GITHUB_TOKEN }} + + - name: Publish - Login to Dockerhub Registry +if: ${{ inputs.publish }} +uses: docker/login-action@v2 +with: + username: ${{ secrets.DOCKERHUB_USER }} + password: ${{ secrets.DOCKERHUB_TOKEN }} + + - name: Publish - Push Image +if: ${{ inputs.publish }} +uses: docker/build-push-action@v2 +with: + context: ${{ env.IMAGE_PATH }} + push: true + tags: ${{ env.PUBLISH_IMAGE_URL }} + platforms: linux/amd64,linux/arm64 diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml new file mode 100644 index 000..a44153b
[spark-docker] branch master updated: [SPARK-40569][TESTS] Add smoke test in standalone cluster for spark-docker
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 52152c1 [SPARK-40569][TESTS] Add smoke test in standalone cluster for spark-docker 52152c1 is described below commit 52152c1b6d70acc2e7c5e32bffe0265b55df7b6f Author: Qian.Sun AuthorDate: Wed Nov 9 09:34:47 2022 +0800 [SPARK-40569][TESTS] Add smoke test in standalone cluster for spark-docker ### What changes were proposed in this pull request? This PR aims to add smoke test in standalone cluster for spark-docker repo. ### Why are the changes needed? Verify spark docker works normally in standalone cluster. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New test in GA. Closes #21 from dcoliversun/SPARK-40569. Authored-by: Qian.Sun Signed-off-by: Yikun Jiang --- .github/workflows/main.yml | 3 + testing/run_tests.sh | 25 ++ testing/testing.sh | 207 + 3 files changed, 235 insertions(+) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 08bba68..accf8ae 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -110,6 +110,9 @@ jobs: platforms: linux/amd64,linux/arm64 push: true + - name : Test - Run spark application for standalone cluster on docker +run: testing/run_tests.sh --image-url $IMAGE_URL --scala-version ${{ matrix.scala_version }} --spark-version ${{ matrix.spark_version }} + - name: Test - Checkout Spark repository uses: actions/checkout@v3 with: diff --git a/testing/run_tests.sh b/testing/run_tests.sh new file mode 100755 index 000..c612dcd --- /dev/null +++ b/testing/run_tests.sh @@ -0,0 +1,25 @@ +#!/bin/bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +set -eo errexit + +SCRIPT_DIR=$(cd "$(dirname "$0")" && pwd) + +. "${SCRIPT_DIR}/testing.sh" + +echo "Test successfully finished" diff --git a/testing/testing.sh b/testing/testing.sh new file mode 100755 index 000..d399d6d --- /dev/null +++ b/testing/testing.sh @@ -0,0 +1,207 @@ +#!/bin/bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# This test script runs a simple smoke test in standalone cluster: +# - create docker network +# - start up a master +# - start up a worker +# - wait for the web UI endpoint to return successfully +# - run a simple smoke test in standalone cluster +# - clean up test resource + +CURL_TIMEOUT=1 +CURL_COOLDOWN=1 +CURL_MAX_TRIES=30 + +NETWORK_NAME=spark-net-bridge + +SUBMIT_CONTAINER_NAME=spark-submit +MASTER_CONTAINER_NAME=spark-master +WORKER_CONTAINER_NAME=spark-worker +SPARK_MASTER_PORT=7077 +SPARK_MASTER_WEBUI_CONTAINER_PORT=8080 +SPARK_MASTER_WEBUI_HOST_PORT=8080 +SPARK_WORKER_WEBUI_CONTAINER_PORT=8081 +SPARK_WORKER_WEBUI_HOST_PORT=8081 + +SCALA_VERSION="2.12" +SPARK_VERSION="3.3.0" +IMAGE_URL= + +# Create a new docker bridge network +function create_network() { + if [ ! -z $(docker network ls --filter name=^${NETWORK_NAME}$ --format="{{ .Name }}") ]; then +# bridge network already exists, need to kill containers attac
[spark-docker] branch master updated: [SPARK-40969] Replace spark TGZ url with apache archive url
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 243ce20 [SPARK-40969] Replace spark TGZ url with apache archive url 243ce20 is described below commit 243ce201296c20ae48b32a87d254800e8ad197ef Author: Qian.Sun AuthorDate: Tue Nov 1 11:14:04 2022 +0800 [SPARK-40969] Replace spark TGZ url with apache archive url ### What changes were proposed in this pull request? This PR aims to replace spark TGZ url with apache archive url. ### Why are the changes needed? ``` #13 [linux/amd64 4/9] RUN set -ex; export SPARK_TMP="$(mktemp -d)"; cd $SPARK_TMP; wget -nv -O spark.tgz "https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz;; wget -nv -O spark.tgz.asc "https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz.asc;; export GNUPGHOME="$(mktemp -d)"; gpg --keyserver hkps://keys.openpgp.org --recv-key "80FB8EBE8EBA68504989703491B5DC815DBF10D3" || gpg --keyserver hkps://keyserver.ubuntu.com [...] #0 0.132 ++ mktemp -d #0 0.133 + export SPARK_TMP=/tmp/tmp.oEdW8CyP9h #0 0.133 + SPARK_TMP=/tmp/tmp.oEdW8CyP9h #0 0.133 + cd /tmp/tmp.oEdW8CyP9h #0 0.133 + wget -nv -O spark.tgz https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz #0 0.152 https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz: #0 0.152 2022-10-31 04:06:44 ERROR 404: Not Found. #13 ERROR: process "/bin/sh -c set -ex; export SPARK_TMP=\"$(mktemp -d)\"; cd $SPARK_TMP; wget -nv -O spark.tgz \"$SPARK_TGZ_URL\"; wget -nv -O spark.tgz.asc \"$SPARK_TGZ_ASC_URL\"; export GNUPGHOME=\"$(mktemp -d)\"; gpg --keyserver hkps://keys.openpgp.org --recv-key \"$GPG_KEY\" || gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys \"$GPG_KEY\"; gpg --batch --verify spark.tgz.asc spark.tgz; gpgconf --kill all; rm -rf \"$GNUPGHOME\" spark.t [...] ``` Old url `https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz` is not found. Better to use unity apache archive url. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? No need to add new tests. Closes #22 from dcoliversun/SPARK-40969. Authored-by: Qian.Sun Signed-off-by: Yikun Jiang --- 3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile | 4 ++-- 3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile | 4 ++-- 3.3.0/scala2.12-java11-r-ubuntu/Dockerfile | 4 ++-- 3.3.0/scala2.12-java11-ubuntu/Dockerfile | 4 ++-- 3.3.1/scala2.12-java11-python3-r-ubuntu/Dockerfile | 4 ++-- 3.3.1/scala2.12-java11-python3-ubuntu/Dockerfile | 4 ++-- 3.3.1/scala2.12-java11-r-ubuntu/Dockerfile | 4 ++-- 3.3.1/scala2.12-java11-ubuntu/Dockerfile | 4 ++-- Dockerfile.template| 4 ++-- 9 files changed, 18 insertions(+), 18 deletions(-) diff --git a/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile b/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile index 8c2761e..fb48b80 100644 --- a/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile +++ b/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile @@ -42,8 +42,8 @@ RUN set -ex && \ # Install Apache Spark # https://downloads.apache.org/spark/KEYS -ENV SPARK_TGZ_URL=https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz \ - SPARK_TGZ_ASC_URL=https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz.asc \ +ENV SPARK_TGZ_URL=https://archive.apache.org/dist/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz \ + SPARK_TGZ_ASC_URL=https://archive.apache.org/dist/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz.asc \ GPG_KEY=80FB8EBE8EBA68504989703491B5DC815DBF10D3 RUN set -ex; \ diff --git a/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile b/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile index 6a0017a..1b6a02c 100644 --- a/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile +++ b/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile @@ -41,8 +41,8 @@ RUN set -ex && \ # Install Apache Spark # https://downloads.apache.org/spark/KEYS -ENV SPARK_TGZ_URL=https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz \ - SPARK_TGZ_ASC_URL=https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz.asc \ +ENV SPARK_TGZ_URL=https://archive.apache.org/dist/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz \ + SPARK_TGZ_ASC_URL=https://archive.apache.org/dist/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz.asc \ GPG_KEY=80FB8EBE8EBA68504989703491B5DC815DBF10D3 RUN set -ex; \ diff --git a/3.3.0/sca
[spark] branch master updated: [SPARK-40229][PS][TEST][FOLLOWUP] Add `openpyxl` to `requirements.txt`
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5c9843db2b3 [SPARK-40229][PS][TEST][FOLLOWUP] Add `openpyxl` to `requirements.txt` 5c9843db2b3 is described below commit 5c9843db2b3ddec0b03374df03dcaa1847941c34 Author: Dongjoon Hyun AuthorDate: Fri Oct 28 19:05:38 2022 +0800 [SPARK-40229][PS][TEST][FOLLOWUP] Add `openpyxl` to `requirements.txt` ### What changes were proposed in this pull request? This is a follow-up of https://github.com/apache/spark/pull/37671. ### Why are the changes needed? Since https://github.com/apache/spark/pull/37671 added `openpyxl` for PySpark test environments and re-enabled `test_to_excel` test, we need to add it to `requirements.txt` as PySpark test dependency explicitly. ### Does this PR introduce _any_ user-facing change? No. This is a test dependency. ### How was this patch tested? Manually. Closes #38425 from dongjoon-hyun/SPARK-40229. Authored-by: Dongjoon Hyun Signed-off-by: Yikun Jiang --- dev/requirements.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/dev/requirements.txt b/dev/requirements.txt index fa4b6752f14..2f32066d6a8 100644 --- a/dev/requirements.txt +++ b/dev/requirements.txt @@ -13,6 +13,7 @@ matplotlib<3.3.0 # PySpark test dependencies unittest-xml-reporting +openpyxl # PySpark test dependencies (optional) coverage - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-docker] branch master updated: [SPARK-40855] Add CONTRIBUTING.md for apache/spark-docker
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new f6bab6b [SPARK-40855] Add CONTRIBUTING.md for apache/spark-docker f6bab6b is described below commit f6bab6be5ddcd41d2b6c1b0c139316bc311e13aa Author: Qian.Sun AuthorDate: Tue Oct 25 10:15:10 2022 +0800 [SPARK-40855] Add CONTRIBUTING.md for apache/spark-docker ### What changes were proposed in this pull request? This PR aims to add `CONTRIBUTING.md` for apache/spark-docker. ### Why are the changes needed? Better to briefly explain how to contribute DOI. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? ![image](https://user-images.githubusercontent.com/44011673/197155544-bfae0c70-ee01-44b0-851d-ed5c288129d9.png) Closes #19 from dcoliversun/SPARK-40855. Authored-by: Qian.Sun Signed-off-by: Yikun Jiang --- CONTRIBUTING.md | 22 ++ 1 file changed, 22 insertions(+) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 000..4ba4baa --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,22 @@ +## Contributing to Spark Docker + +Thanks for improving the project! *Before opening a pull request*, review the +[Contributing to Spark guide](https://spark.apache.org/contributing.html). +It lists steps that are required before creating a PR. In particular, consider: + +- Is the change important and ready enough to ask the community to spend time reviewing? +- Have you searched for existing, related JIRAs and pull requests? +- Is this a new feature that can stand alone as a [third party project](https://spark.apache.org/third-party-projects.html) ? +- Is the change being proposed clearly explained and motivated? + +When you contribute code, you affirm that the contribution is your original work and that you +license the work to the project under the project's open source license. Whether or not you +state this explicitly, by submitting any copyrighted material via pull request, email, or +other means you agree to license the material under the project's open source license and +warrant that you have the legal authority to do so. + +### How to update Dockerfile + +- Update `Dockerfile.template` +- Update `tools/template.py` if need template file render change +- Exec `add-dockerfiles.sh ` \ No newline at end of file - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-40882][INFRA] Upgrade actions/setup-java to v3 with distribution specified
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 825f2190bd8 [SPARK-40882][INFRA] Upgrade actions/setup-java to v3 with distribution specified 825f2190bd8 is described below commit 825f2190bd826a8a877739454393e79ef163fdf1 Author: Yikun Jiang AuthorDate: Mon Oct 24 14:51:26 2022 +0800 [SPARK-40882][INFRA] Upgrade actions/setup-java to v3 with distribution specified ### What changes were proposed in this pull request? Upgrade actions/setup-java to v3 with distribution specified ### Why are the changes needed? - The `distribution` is required after v2, now just keep `zulu` (same distribution with v1): https://github.com/actions/setup-java/releases/tag/v2.0.0 - https://github.com/actions/setup-java/releases/tag/v3.0.0: Upgrade node - https://github.com/actions/setup-java/releases/tag/v3.6.0: Cleanup set-output warning ### Does this PR introduce _any_ user-facing change? No,dev only ### How was this patch tested? CI passed Closes #38354 from Yikun/SPARK-40882. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- .github/workflows/benchmark.yml| 6 -- .github/workflows/build_and_test.yml | 27 ++- .github/workflows/publish_snapshot.yml | 3 ++- 3 files changed, 24 insertions(+), 12 deletions(-) diff --git a/.github/workflows/benchmark.yml b/.github/workflows/benchmark.yml index 227c444a7a4..8671cff054b 100644 --- a/.github/workflows/benchmark.yml +++ b/.github/workflows/benchmark.yml @@ -105,8 +105,9 @@ jobs: run: cd tpcds-kit/tools && make OS=LINUX - name: Install Java ${{ github.event.inputs.jdk }} if: steps.cache-tpcds-sf-1.outputs.cache-hit != 'true' -uses: actions/setup-java@v1 +uses: actions/setup-java@v3 with: + distribution: temurin java-version: ${{ github.event.inputs.jdk }} - name: Generate TPC-DS (SF=1) table data if: steps.cache-tpcds-sf-1.outputs.cache-hit != 'true' @@ -156,8 +157,9 @@ jobs: restore-keys: | benchmark-coursier-${{ github.event.inputs.jdk }} - name: Install Java ${{ github.event.inputs.jdk }} - uses: actions/setup-java@v1 + uses: actions/setup-java@v3 with: +distribution: temurin java-version: ${{ github.event.inputs.jdk }} - name: Cache TPC-DS generated data if: contains(github.event.inputs.class, 'TPCDSQueryBenchmark') || contains(github.event.inputs.class, '*') diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 0e0314e2950..688c40cc3b6 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -227,8 +227,9 @@ jobs: restore-keys: | ${{ matrix.java }}-${{ matrix.hadoop }}-coursier- - name: Install Java ${{ matrix.java }} - uses: actions/setup-java@v1 + uses: actions/setup-java@v3 with: +distribution: temurin java-version: ${{ matrix.java }} - name: Install Python 3.8 uses: actions/setup-python@v2 @@ -384,8 +385,9 @@ jobs: restore-keys: | pyspark-coursier- - name: Install Java ${{ matrix.java }} - uses: actions/setup-java@v1 + uses: actions/setup-java@v3 with: +distribution: temurin java-version: ${{ matrix.java }} - name: List Python packages (Python 3.9, PyPy3) run: | @@ -473,8 +475,9 @@ jobs: restore-keys: | sparkr-coursier- - name: Install Java ${{ inputs.java }} - uses: actions/setup-java@v1 + uses: actions/setup-java@v3 with: +distribution: temurin java-version: ${{ inputs.java }} - name: Run tests env: ${{ fromJSON(inputs.envs) }} @@ -597,8 +600,9 @@ jobs: cd docs bundle install - name: Install Java 8 - uses: actions/setup-java@v1 + uses: actions/setup-java@v3 with: +distribution: temurin java-version: 8 - name: Scala linter run: ./dev/lint-scala @@ -664,8 +668,9 @@ jobs: restore-keys: | java${{ matrix.java }}-maven- - name: Install Java ${{ matrix.java }} - uses: actions/setup-java@v1 + uses: actions/setup-java@v3 with: +distribution: temurin java-version: ${{ matrix.java }} - name: Build with Maven run: | @@ -713,8 +718,9 @@ jobs: restore-keys: | scala-213-coursier- - name: Install Java 8 - uses: actions/setup-java@v1 + uses: actions/setup-java@v3 with: +distribution: temurin java-version: 8 - name: Build with SBT run: | @@ -761,8 +767,9 @@ jobs: rest
[spark] branch master updated (58490da6d2e -> c721c7299d8)
This is an automated email from the ASF dual-hosted git repository. yikun pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 58490da6d2e [SPARK-40800][SQL] Always inline expressions in OptimizeOneRowRelationSubquery add c721c7299d8 [SPARK-40881][INFRA] Upgrade actions/cache to v3 and actions/upload-artifact to v3 No new revisions were added by this update. Summary of changes: .github/workflows/benchmark.yml| 14 .github/workflows/build_and_test.yml | 60 +- .github/workflows/publish_snapshot.yml | 2 +- 3 files changed, 38 insertions(+), 38 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f0950fea814 -> fea6458806d)
This is an automated email from the ASF dual-hosted git repository. yikun pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from f0950fea814 [SPARK-40878][INFRA] pin 'grpcio==1.48.1' 'protobuf==4.21.6' add fea6458806d [SPARK-40870][INFRA] Upgrade docker actions to cleanup warning No new revisions were added by this update. Summary of changes: .github/workflows/build_and_test.yml | 6 +++--- .github/workflows/build_infra_images_cache.yml | 8 2 files changed, 7 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-docker] branch master updated: [SPARK-40864] Remove pip/setuptools dynamic upgrade
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 52e5856 [SPARK-40864] Remove pip/setuptools dynamic upgrade 52e5856 is described below commit 52e5856d81e70a9d9e87292c6caf42587ce433df Author: Yikun Jiang AuthorDate: Fri Oct 21 17:02:54 2022 +0800 [SPARK-40864] Remove pip/setuptools dynamic upgrade ### What changes were proposed in this pull request? Remove pip/setuptools dynamic upgrade in dockerfile ### Why are the changes needed? According to [official image suggestion](https://github.com/docker-library/official-images#repeatability), `Rebuilding the same Dockerfile should result in the same version of the image being packaged`. But we used to upgrade pip/setuptools to latest, actually we don't need a latest pip/setuptools for any reason I can think out. I also take a look on [initial commits](https://github.com/apache-spark-on-k8s/spark/commit/befcf0a30651d0335bb57c242a824e43748db33f) for this line, according merge history no more reason for it. ### Does this PR introduce _any_ user-facing change? The OS recommand pip/setuptools version is used. ### How was this patch tested? CI passed. Closes #17 from Yikun/remove-pip. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- 3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile | 1 - 3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile | 1 - Dockerfile.template| 1 - 3 files changed, 3 deletions(-) diff --git a/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile b/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile index ac16bdd..8c2761e 100644 --- a/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile +++ b/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile @@ -26,7 +26,6 @@ RUN set -ex && \ ln -s /lib /lib64 && \ apt install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user libnss3 procps net-tools gosu && \ apt install -y python3 python3-pip && \ -pip3 install --upgrade pip setuptools && \ apt install -y r-base r-base-dev && \ mkdir -p /opt/spark && \ mkdir /opt/spark/python && \ diff --git a/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile b/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile index c6e433d..6a0017a 100644 --- a/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile +++ b/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile @@ -26,7 +26,6 @@ RUN set -ex && \ ln -s /lib /lib64 && \ apt install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user libnss3 procps net-tools gosu && \ apt install -y python3 python3-pip && \ -pip3 install --upgrade pip setuptools && \ mkdir -p /opt/spark && \ mkdir /opt/spark/python && \ mkdir -p /opt/spark/examples && \ diff --git a/Dockerfile.template b/Dockerfile.template index 2b90fe5..a220247 100644 --- a/Dockerfile.template +++ b/Dockerfile.template @@ -27,7 +27,6 @@ RUN set -ex && \ apt install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user libnss3 procps net-tools gosu && \ {%- if HAVE_PY %} apt install -y python3 python3-pip && \ -pip3 install --upgrade pip setuptools && \ {%- endif %} {%- if HAVE_R %} apt install -y r-base r-base-dev && \ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-docker] branch master updated: [SPARK-40866][INFRA] Rename Spark repository as Spark Docker repository in GA
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 6f56ef1 [SPARK-40866][INFRA] Rename Spark repository as Spark Docker repository in GA 6f56ef1 is described below commit 6f56ef1c8c8bccd05069d4590f7ae084d4c72b4d Author: Qian.Sun AuthorDate: Fri Oct 21 16:02:50 2022 +0800 [SPARK-40866][INFRA] Rename Spark repository as Spark Docker repository in GA ### What changes were proposed in this pull request? This PR aim to rename `Spark repository` as `Spark Docker repository` in GA, discussion as https://github.com/apache/spark-docker/pull/15#discussion_r1001440707 ### Why are the changes needed? Actually repository is apache/spark-docker. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass the GA Closes #18 from dcoliversun/SPARK-40866. Authored-by: Qian.Sun Signed-off-by: Yikun Jiang --- .github/workflows/main.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index b47245b..08bba68 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -60,7 +60,7 @@ jobs: - ${{ inputs.java }} image_suffix: [python3-ubuntu, ubuntu, r-ubuntu, python3-r-ubuntu] steps: - - name: Checkout Spark repository + - name: Checkout Spark Docker repository uses: actions/checkout@v2 - name: Set up QEMU - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-40860][INFRA] Change `set-output` to `GITHUB_EVENT`
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 40086cb9b21 [SPARK-40860][INFRA] Change `set-output` to `GITHUB_EVENT` 40086cb9b21 is described below commit 40086cb9b21fe207242c4928d8e2cc3e756d61da Author: Yikun Jiang AuthorDate: Fri Oct 21 11:06:33 2022 +0800 [SPARK-40860][INFRA] Change `set-output` to `GITHUB_EVENT` ### What changes were proposed in this pull request? Change `set-output` to `GITHUB_OUTPUT`. ### Why are the changes needed? The `set-output` command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/ ### Does this PR introduce _any_ user-facing change? No, dev only ### How was this patch tested? - CI passed - Also do a local test on benchmark: https://github.com/Yikun/spark/actions/runs/3294384181/jobs/5431945626 Closes #38323 from Yikun/set-output. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- .github/workflows/benchmark.yml | 2 +- .github/workflows/build_and_test.yml | 13 ++--- 2 files changed, 7 insertions(+), 8 deletions(-) diff --git a/.github/workflows/benchmark.yml b/.github/workflows/benchmark.yml index 5508227b8b2..f73267a95fa 100644 --- a/.github/workflows/benchmark.yml +++ b/.github/workflows/benchmark.yml @@ -54,7 +54,7 @@ jobs: steps: - name: Generate matrix id: set-matrix - run: echo "::set-output name=matrix::["`seq -s, 1 $SPARK_BENCHMARK_NUM_SPLITS`"]" + run: echo "matrix=["`seq -s, 1 $SPARK_BENCHMARK_NUM_SPLITS`"]" >> $GITHUB_OUTPUT # Any TPC-DS related updates on this job need to be applied to tpcds-1g job of build_and_test.yml as well tpcds-1g-gen: diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index e0adad54aed..f9b445e9bbd 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -103,16 +103,15 @@ jobs: \"k8s-integration-tests\" : \"true\", }" echo $precondition # For debugging - # GitHub Actions set-output doesn't take newlines - # https://github.community/t/set-output-truncates-multiline-strings/16852/3 - precondition="${precondition//$'\n'/'%0A'}" - echo "::set-output name=required::$precondition" + # Remove `\n` to avoid "Invalid format" error + precondition="${precondition//$'\n'/}}" + echo "required=$precondition" >> $GITHUB_OUTPUT else # This is usually set by scheduled jobs. precondition='${{ inputs.jobs }}' echo $precondition # For debugging - precondition="${precondition//$'\n'/'%0A'}" - echo "::set-output name=required::$precondition" + precondition="${precondition//$'\n'/}" + echo "required=$precondition" >> $GITHUB_OUTPUT fi - name: Generate infra image URL id: infra-image-outputs @@ -121,7 +120,7 @@ jobs: REPO_OWNER=$(echo "${{ github.repository_owner }}" | tr '[:upper:]' '[:lower:]') IMG_NAME="apache-spark-ci-image:${{ inputs.branch }}-${{ github.run_id }}" IMG_URL="ghcr.io/$REPO_OWNER/$IMG_NAME" -echo ::set-output name=image_url::$IMG_URL +echo "image_url=$IMG_URL" >> $GITHUB_OUTPUT # Build: build Spark and run the tests for specified modules. build: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-40859][INFRA] Upgrade action/checkout to v3 to cleanup warning
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 17efe044fa7 [SPARK-40859][INFRA] Upgrade action/checkout to v3 to cleanup warning 17efe044fa7 is described below commit 17efe044fa7d366fa0beafe71c5e76d46f942b7e Author: Yikun Jiang AuthorDate: Fri Oct 21 10:36:00 2022 +0800 [SPARK-40859][INFRA] Upgrade action/checkout to v3 to cleanup warning ### What changes were proposed in this pull request? Upgrade action/checkout to v3 (point ot v3.1 now). ### Why are the changes needed? - https://github.com/actions/checkout/releases/tag/v3.1.0 cleanup "[The 'set-output' command is deprecated and will be disabled soon.](https://github.com/actions/checkout/issues/959#issuecomment-1282107197)" - https://github.com/actions/checkout/releases/tag/v3.0.0 since v3, use the node 16 to cleanup "[Node.js 12 actions are deprecated](https://github.blog/changelog/2022-09-22-github-actions-all-actions-will-begin-running-on-node16-instead-of-node12/)" According to https://github.com/actions/checkout/issues/959#issuecomment-1282107197, v2.5 also address 'set-output' warning, but only v3 support node 16, so we upgrade to v3.1 rather than v2.5 ### Does this PR introduce _any_ user-facing change? No, dev only ### How was this patch tested? CI passed Closes #38322 from Yikun/checkout-v3. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- .github/workflows/benchmark.yml| 6 +++--- .github/workflows/build_and_test.yml | 24 .github/workflows/build_infra_images_cache.yml | 2 +- .github/workflows/publish_snapshot.yml | 2 +- 4 files changed, 17 insertions(+), 17 deletions(-) diff --git a/.github/workflows/benchmark.yml b/.github/workflows/benchmark.yml index 52adec20e5c..5508227b8b2 100644 --- a/.github/workflows/benchmark.yml +++ b/.github/workflows/benchmark.yml @@ -65,7 +65,7 @@ jobs: SPARK_LOCAL_IP: localhost steps: - name: Checkout Spark repository -uses: actions/checkout@v2 +uses: actions/checkout@v3 # In order to get diff files with: fetch-depth: 0 @@ -95,7 +95,7 @@ jobs: key: tpcds-${{ hashFiles('.github/workflows/benchmark.yml', 'sql/core/src/test/scala/org/apache/spark/sql/TPCDSSchema.scala') }} - name: Checkout tpcds-kit repository if: steps.cache-tpcds-sf-1.outputs.cache-hit != 'true' -uses: actions/checkout@v2 +uses: actions/checkout@v3 with: repository: databricks/tpcds-kit ref: 2a5078a782192ddb6efbcead8de9973d6ab4f069 @@ -133,7 +133,7 @@ jobs: SPARK_TPCDS_DATA: ${{ github.workspace }}/tpcds-sf-1 steps: - name: Checkout Spark repository - uses: actions/checkout@v2 + uses: actions/checkout@v3 # In order to get diff files with: fetch-depth: 0 diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 12a1ad0e71e..e0adad54aed 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -63,7 +63,7 @@ jobs: }} steps: - name: Checkout Spark repository - uses: actions/checkout@v2 + uses: actions/checkout@v3 with: fetch-depth: 0 repository: apache/spark @@ -195,7 +195,7 @@ jobs: SPARK_LOCAL_IP: localhost steps: - name: Checkout Spark repository - uses: actions/checkout@v2 + uses: actions/checkout@v3 # In order to fetch changed files with: fetch-depth: 0 @@ -286,7 +286,7 @@ jobs: username: ${{ github.actor }} password: ${{ secrets.GITHUB_TOKEN }} - name: Checkout Spark repository -uses: actions/checkout@v2 +uses: actions/checkout@v3 # In order to fetch changed files with: fetch-depth: 0 @@ -349,7 +349,7 @@ jobs: METASPACE_SIZE: 1g steps: - name: Checkout Spark repository - uses: actions/checkout@v2 + uses: actions/checkout@v3 # In order to fetch changed files with: fetch-depth: 0 @@ -438,7 +438,7 @@ jobs: SKIP_MIMA: true steps: - name: Checkout Spark repository - uses: actions/checkout@v2 + uses: actions/checkout@v3 # In order to fetch changed files with: fetch-depth: 0 @@ -508,7 +508,7 @@ jobs: image: ${{ needs.precondition.outputs.image_url }} steps: - name: Checkout Spark repository - uses: actions/checkout@v2 + uses: actions/checkout@v3 with: fetch-depth: 0 repository: apache/spark @@ -635,7 +635,7 @@ jobs: runs-on: ubuntu-20.04 steps: - name: Check
[spark] branch master updated: [SPARK-40838][INFRA][TESTS] Upgrade infra base image to focal-20220922 and fix ps.mlflow doctest
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2698d6bf10b [SPARK-40838][INFRA][TESTS] Upgrade infra base image to focal-20220922 and fix ps.mlflow doctest 2698d6bf10b is described below commit 2698d6bf10b92e71e8af88fedb4e7c9e0f304416 Author: Yikun Jiang AuthorDate: Thu Oct 20 15:54:18 2022 +0800 [SPARK-40838][INFRA][TESTS] Upgrade infra base image to focal-20220922 and fix ps.mlflow doctest ### What changes were proposed in this pull request? Upgrade infra base image to focal-20220922 and fix ps.mlflow doctest ### Why are the changes needed? - Upgrade infra base image to `focal-20220922` (Ubuntu 20.04 currently latest) - Infra Image Python version updated. - numpy 1.23.3 --> 1.23.4 - mlflow 1.28.0 --> 1.29.0 - matplotlib 3.5.3 --> 3.6.1 - pip 22.2.2 --> 22.3 - scipy 1.9.1 --> 1.9.3 Full list: https://www.diffchecker.com/e6eZZaYn - Fix ps.mlfow doctest (due to mlflow upgrade): ``` ** File "/__w/spark/spark/python/pyspark/pandas/mlflow.py", line 158, in pyspark.pandas.mlflow.load_model Failed example: with mlflow.start_run(): lr = LinearRegression() lr.fit(train_x, train_y) mlflow.sklearn.log_model(lr, "model") Expected: LinearRegression(...) Got: LinearRegression() ``` ### Does this PR introduce _any_ user-facing change? No, dev only ### How was this patch tested? All CI passed Closes #38304 from Yikun/SPARK-40838. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- dev/infra/Dockerfile| 4 ++-- python/pyspark/pandas/mlflow.py | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile index ccf0c932b0e..2a70bd3f98f 100644 --- a/dev/infra/Dockerfile +++ b/dev/infra/Dockerfile @@ -17,9 +17,9 @@ # Image for building and testing Spark branches. Based on Ubuntu 20.04. # See also in https://hub.docker.com/_/ubuntu -FROM ubuntu:focal-20220801 +FROM ubuntu:focal-20220922 -ENV FULL_REFRESH_DATE 20220706 +ENV FULL_REFRESH_DATE 20221019 ENV DEBIAN_FRONTEND noninteractive ENV DEBCONF_NONINTERACTIVE_SEEN true diff --git a/python/pyspark/pandas/mlflow.py b/python/pyspark/pandas/mlflow.py index 094215743e2..469349b37ee 100644 --- a/python/pyspark/pandas/mlflow.py +++ b/python/pyspark/pandas/mlflow.py @@ -159,7 +159,7 @@ def load_model( ... lr = LinearRegression() ... lr.fit(train_x, train_y) ... mlflow.sklearn.log_model(lr, "model") -LinearRegression(...) +LinearRegression... Now that our model is logged using MLflow, we load it back and apply it on a pandas-on-Spark dataframe: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-docker] branch master updated: [SPARK-40845] Add template support for SPARK_GPG_KEY and fix GPG verify
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 896a36e [SPARK-40845] Add template support for SPARK_GPG_KEY and fix GPG verify 896a36e is described below commit 896a36e36c094bf1480f4819005e2982ea8af417 Author: Yikun Jiang AuthorDate: Thu Oct 20 15:38:03 2022 +0800 [SPARK-40845] Add template support for SPARK_GPG_KEY and fix GPG verify ### What changes were proposed in this pull request? This patch: - Add template support for `SPARK_GPG_KEY`. - Fix a bug on GPG verified. (Change `||` to `;`) - Use opengpg.org instead of gpg.com becasue it would be uploaded in [spark release process](https://spark.apache.org/release-process.html). ### Why are the changes needed? Each version have specific GPG key to verified, so we need to set GPG version separately. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Run `./add-dockerfiles.sh 3.3.0` and see GPG set correctly Closes #16 from Yikun/GPG. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- 3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile | 6 +++--- 3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile | 6 +++--- 3.3.0/scala2.12-java11-r-ubuntu/Dockerfile | 6 +++--- 3.3.0/scala2.12-java11-ubuntu/Dockerfile | 6 +++--- Dockerfile.template| 6 +++--- tools/template.py | 6 ++ 6 files changed, 21 insertions(+), 15 deletions(-) diff --git a/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile b/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile index be9cbb0..ac16bdd 100644 --- a/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile +++ b/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile @@ -45,7 +45,7 @@ RUN set -ex && \ # https://downloads.apache.org/spark/KEYS ENV SPARK_TGZ_URL=https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz \ SPARK_TGZ_ASC_URL=https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz.asc \ -GPG_KEY=E298A3A825C0D65DFD57CBB651716619E084DAB9 +GPG_KEY=80FB8EBE8EBA68504989703491B5DC815DBF10D3 RUN set -ex; \ export SPARK_TMP="$(mktemp -d)"; \ @@ -53,8 +53,8 @@ RUN set -ex; \ wget -nv -O spark.tgz "$SPARK_TGZ_URL"; \ wget -nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; \ export GNUPGHOME="$(mktemp -d)"; \ -gpg --keyserver hkps://keyserver.pgp.com --recv-key "$GPG_KEY" || \ -gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY" || \ +gpg --keyserver hkps://keys.openpgp.org --recv-key "$GPG_KEY" || \ +gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY"; \ gpg --batch --verify spark.tgz.asc spark.tgz; \ gpgconf --kill all; \ rm -rf "$GNUPGHOME" spark.tgz.asc; \ diff --git a/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile b/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile index 096c7eb..c6e433d 100644 --- a/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile +++ b/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile @@ -44,7 +44,7 @@ RUN set -ex && \ # https://downloads.apache.org/spark/KEYS ENV SPARK_TGZ_URL=https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz \ SPARK_TGZ_ASC_URL=https://downloads.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz.asc \ -GPG_KEY=E298A3A825C0D65DFD57CBB651716619E084DAB9 +GPG_KEY=80FB8EBE8EBA68504989703491B5DC815DBF10D3 RUN set -ex; \ export SPARK_TMP="$(mktemp -d)"; \ @@ -52,8 +52,8 @@ RUN set -ex; \ wget -nv -O spark.tgz "$SPARK_TGZ_URL"; \ wget -nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; \ export GNUPGHOME="$(mktemp -d)"; \ -gpg --keyserver hkps://keyserver.pgp.com --recv-key "$GPG_KEY" || \ -gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY" || \ +gpg --keyserver hkps://keys.openpgp.org --recv-key "$GPG_KEY" || \ +gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY"; \ gpg --batch --verify spark.tgz.asc spark.tgz; \ gpgconf --kill all; \ rm -rf "$GNUPGHOME" spark.tgz.asc; \ diff --git a/3.3.0/scala2.12-java11-r-ubuntu/Dockerfile b/3.3.0/scala2.12-java11-r-ubuntu/Dockerfile index 2e085a2..975e444 100644 --- a/3.3.0/scala2.12-java11-r-ubuntu/Dockerfile +++ b/3.3.0/scala2.12-java11-r-ubuntu/Dockerfile @@ -42,7 +42,7 @@ RUN set -ex && \ # https://downloads.apache.org/spark/KEYS ENV SPARK_TGZ_URL=https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz \ SPARK_TGZ_ASC_URL=https://downloads.apache.org/spark/
[spark-docker] branch master updated: [SPARK-40833] Cleanup apt lists cache
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 95f5a1f [SPARK-40833] Cleanup apt lists cache 95f5a1f is described below commit 95f5a1f3e846ad3b6550e151fa76b70f6fe0b946 Author: Yikun Jiang AuthorDate: Wed Oct 19 10:17:58 2022 +0800 [SPARK-40833] Cleanup apt lists cache ### What changes were proposed in this pull request? Remove unused apt lists cache and apply `./add-dockerfiles.sh 3.3.0` ### Why are the changes needed? Clean cache to reduce docker image size. This is also [recommanded](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#run) by docker community: ``` $ docker run --user 0:0 -ti apache/spark bash root5d1ca347279e:/opt/spark/work-dir# ls /var/lib/apt/lists/ auxfiles lock deb.debian.org_debian_dists_bullseye-updates_InRelease partial deb.debian.org_debian_dists_bullseye-updates_main_binary-arm64_Packages.lz4 security.debian.org_debian-security_dists_bullseye-security_InRelease deb.debian.org_debian_dists_bullseye_InRelease security.debian.org_debian-security_dists_bullseye-security_main_binary-arm64_Packages.lz4 deb.debian.org_debian_dists_bullseye_main_binary-arm64_Packages.lz4 root5d1ca347279e:/opt/spark/work-dir# du --max-depth=1 -h /var/lib/apt/lists/ 4.0K/var/lib/apt/lists/partial 4.0K/var/lib/apt/lists/auxfiles 17M /var/lib/apt/lists/ ``` ### Does this PR introduce _any_ user-facing change? Yes in some level, image size is reduced. ### How was this patch tested? K8s CI passed Closes #14 from Yikun/clean-apt-list. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- 3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile | 3 ++- 3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile | 3 ++- 3.3.0/scala2.12-java11-r-ubuntu/Dockerfile | 3 ++- 3.3.0/scala2.12-java11-ubuntu/Dockerfile | 3 ++- Dockerfile.template| 3 ++- 5 files changed, 10 insertions(+), 5 deletions(-) diff --git a/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile b/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile index 5dbc973..be9cbb0 100644 --- a/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile +++ b/3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile @@ -38,7 +38,8 @@ RUN set -ex && \ ln -sv /bin/bash /bin/sh && \ echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \ chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \ -rm -rf /var/cache/apt/* +rm -rf /var/cache/apt/* && \ +rm -rf /var/lib/apt/lists/* # Install Apache Spark # https://downloads.apache.org/spark/KEYS diff --git a/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile b/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile index 85e06ce..096c7eb 100644 --- a/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile +++ b/3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile @@ -37,7 +37,8 @@ RUN set -ex && \ ln -sv /bin/bash /bin/sh && \ echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \ chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \ -rm -rf /var/cache/apt/* +rm -rf /var/cache/apt/* && \ +rm -rf /var/lib/apt/lists/* # Install Apache Spark # https://downloads.apache.org/spark/KEYS diff --git a/3.3.0/scala2.12-java11-r-ubuntu/Dockerfile b/3.3.0/scala2.12-java11-r-ubuntu/Dockerfile index 753d585..2e085a2 100644 --- a/3.3.0/scala2.12-java11-r-ubuntu/Dockerfile +++ b/3.3.0/scala2.12-java11-r-ubuntu/Dockerfile @@ -35,7 +35,8 @@ RUN set -ex && \ ln -sv /bin/bash /bin/sh && \ echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \ chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \ -rm -rf /var/cache/apt/* +rm -rf /var/cache/apt/* && \ +rm -rf /var/lib/apt/lists/* # Install Apache Spark # https://downloads.apache.org/spark/KEYS diff --git a/3.3.0/scala2.12-java11-ubuntu/Dockerfile b/3.3.0/scala2.12-java11-ubuntu/Dockerfile index 1e4c604..5858e2d 100644 --- a/3.3.0/scala2.12-java11-ubuntu/Dockerfile +++ b/3.3.0/scala2.12-java11-ubuntu/Dockerfile @@ -34,7 +34,8 @@ RUN set -ex && \ ln -sv /bin/bash /bin/sh && \ echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \ chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \ -rm -rf /var/cache/apt/* +rm -rf /var/cache/apt/* && \ +rm -rf /var/lib/apt/lists/*
[spark-docker] branch master updated: [SPARK-40832][DOCS] Add README for spark-docker
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new c1353a3 [SPARK-40832][DOCS] Add README for spark-docker c1353a3 is described below commit c1353a377176d9f2a84641323840130bd160e436 Author: Yikun Jiang AuthorDate: Wed Oct 19 10:16:41 2022 +0800 [SPARK-40832][DOCS] Add README for spark-docker ### What changes were proposed in this pull request? Add README for spark-docker ### Why are the changes needed? Although the PR of DOI has not been merged yet, but we'd better to briefly explain what this repository does. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Preview manually: https://user-images.githubusercontent.com/1736354/196381318-cb3d72e1-1ba7-479c-82cb-4412dde91179.png;> Closes #13 from Yikun/readme. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- README.md | 18 ++ 1 file changed, 18 insertions(+) diff --git a/README.md b/README.md new file mode 100644 index 000..87286dc --- /dev/null +++ b/README.md @@ -0,0 +1,18 @@ +# Apache Spark Official Dockerfiles + +## What is Apache Spark? + +Spark is a unified analytics engine for large-scale data processing. It provides +high-level APIs in Scala, Java, Python, and R, and an optimized engine that +supports general computation graphs for data analysis. It also supports a +rich set of higher-level tools including Spark SQL for SQL and DataFrames, +pandas API on Spark for pandas workloads, MLlib for machine learning, GraphX for graph processing, +and Structured Streaming for stream processing. + +https://spark.apache.org/ + +## About this repository + +This repository contains the Dockerfiles used to build the Apache Spark Docker Image. + +See more in [SPARK-40513: SPIP: Support Docker Official Image for Spark](https://issues.apache.org/jira/browse/SPARK-40513). - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-docker] branch master updated: [SPARK-40528] Support dockerfile template
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 6459e3d [SPARK-40528] Support dockerfile template 6459e3d is described below commit 6459e3d09a2e009573be355e63c404bb35139d28 Author: Yikun Jiang AuthorDate: Mon Oct 17 16:23:23 2022 +0800 [SPARK-40528] Support dockerfile template ### What changes were proposed in this pull request? This patch: - Add dockerfile template: `Dockerfile.template` contains 3 vars: `BASE_IMAGE` for base image name, `HAVE_PY` for adding python support, `HAVE_R` for adding sparkr support. - Add a script: `add-dockerfiles.sh`, you can `./add-dockerfiles.sh 3.3.0` - Add a tool: `tempalte.py` to help generate dockerfile from jinja template. ### Why are the changes needed? Generate the dockerfiles to make life easier. ### Does this PR introduce _any_ user-facing change? No, dev only. ### How was this patch tested? ```shell # Prepare new env python3 -m venv ~/xxx pip install -r ./tools/requirements.txt source ~/xxx/bin/activate # Generate 3.3.0 ./add-dockerfiles.sh 3.3.0 # no diff git diff ``` lint: ``` $ flake8 ./tools/template.py $ black ./tools/template.py All done! ✨ ✨ 1 file left unchanged. ``` Closes #12 from Yikun/SPARK-40528. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- Dockerfile.template| 98 ++ add-dockerfiles.sh | 53 +++ entrypoint.sh.template | 114 + tools/requirements.txt | 1 + tools/template.py | 84 5 files changed, 350 insertions(+) diff --git a/Dockerfile.template b/Dockerfile.template new file mode 100644 index 000..2001281 --- /dev/null +++ b/Dockerfile.template @@ -0,0 +1,98 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +FROM {{ BASE_IMAGE }} + +ARG spark_uid=185 + +RUN groupadd --system --gid=${spark_uid} spark && \ +useradd --system --uid=${spark_uid} --gid=spark spark + +RUN set -ex && \ +apt-get update && \ +ln -s /lib /lib64 && \ +apt install -y gnupg2 wget bash tini libc6 libpam-modules krb5-user libnss3 procps net-tools gosu && \ +{%- if HAVE_PY %} +apt install -y python3 python3-pip && \ +pip3 install --upgrade pip setuptools && \ +{%- endif %} +{%- if HAVE_R %} +apt install -y r-base r-base-dev && \ +{%- endif %} +mkdir -p /opt/spark && \ +{%- if HAVE_PY %} +mkdir /opt/spark/python && \ +{%- endif %} +mkdir -p /opt/spark/examples && \ +mkdir -p /opt/spark/work-dir && \ +touch /opt/spark/RELEASE && \ +chown -R spark:spark /opt/spark && \ +rm /bin/sh && \ +ln -sv /bin/bash /bin/sh && \ +echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \ +chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \ +rm -rf /var/cache/apt/* + +# Install Apache Spark +# https://downloads.apache.org/spark/KEYS +ENV SPARK_TGZ_URL=https://dlcdn.apache.org/spark/spark-{{ SPARK_VERSION }}/spark-{{ SPARK_VERSION }}-bin-hadoop3.tgz \ +SPARK_TGZ_ASC_URL=https://downloads.apache.org/spark/spark-{{ SPARK_VERSION }}/spark-{{ SPARK_VERSION }}-bin-hadoop3.tgz.asc \ +GPG_KEY=E298A3A825C0D65DFD57CBB651716619E084DAB9 + +RUN set -ex; \ +export SPARK_TMP="$(mktemp -d)"; \ +cd $SPARK_TMP; \ +wget -nv -O spark.tgz "$SPARK_TGZ_URL"; \ +wget -nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; \ +export GNUPGHOME="$(mktemp -d)"; \ +gpg --keyserver hkps://keyserver.pgp.com --recv-key "$GPG_KEY" || \ +gpg --keyserver hkps://keyserver.ubuntu.com --recv-keys "$GPG_KEY" || \ +gpg --batch --verify spark.tgz.asc spark.tgz
[spark-docker] branch master updated: [SPARK-40805] Use `spark` username in official image
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new a75ecb1 [SPARK-40805] Use `spark` username in official image a75ecb1 is described below commit a75ecb13dee5580a149f2ef0bd9f8a4371d3d956 Author: Yikun Jiang AuthorDate: Mon Oct 17 16:09:30 2022 +0800 [SPARK-40805] Use `spark` username in official image ### What changes were proposed in this pull request? This patch: - Add spark uid/gid in dockerfile (useradd and groupadd). (used in entrypoint) This way is also used by [others DOI](https://github.com/search?p=2=org%3Adocker-library+useradd=Code) and apache DOI (such as [zookeeper](https://github.com/31z4/zookeeper-docker/blob/master/3.8.0/Dockerfile#L17-L21), [solr](https://github.com/apache/solr-docker/blob/a20477ed123cd1a72132aebcc0742cee46b5f976/9.0/Dockerfile#L108-L110), [flink](https://github.com/apache/flink-docker/blob/master/1.15/sc [...] - Use `spark` user in `entrypoint.sh` rather than Dockerfile. (make sure the spark process is executed as non-root users) - Remove `USER` setting in Dockerfile. (make sure base image has permission to extend dockerifle, such as execute `apt update`) - Chown script to `spark:spark` instead of `root:root`. (avoid permission issue such like standalone mode) - Add `gosu` deps, a `sudo` replacement recommanded by [docker](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#user) and [docker official image](https://github.com/docker-library/official-images/blob/9a4d54f1a42ea82970baa4e6f3d0bc75e98fc961/README.md#consistency), and also are used by other DOI images. This change also follow the rules of docker official images, see also [consistency](https://github.com/docker-library/official-images/blob/9a4d54f1a42ea82970baa4e6f3d0bc75e98fc961/README.md#consistency) and [dockerfile best practices about user](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#user). ### Why are the changes needed? The below issues are what I have found so far 1. **Irregular login username** Docker images username is not very standard, docker run with `185` username is a little bit weird. ``` $ docker run -ti apache/spark bash 185d88a24357413:/opt/spark/work-dir$ ``` 2. **Permission issue of spark sbin** And also there are some permission issue when running some spark script, such as standalone mode: ``` $ docker run -ti apache/spark /opt/spark/sbin/start-master.sh mkdir: cannot create directory ‘/opt/spark/logs’: Permission denied chown: cannot access '/opt/spark/logs': No such file or directory starting org.apache.spark.deploy.master.Master, logging to /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-1c345a00e312.out /opt/spark/sbin/spark-daemon.sh: line 135: /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-1c345a00e312.out: No such file or directory failed to launch: nice -n 0 /opt/spark/bin/spark-class org.apache.spark.deploy.master.Master --host 1c345a00e312 --port 7077 --webui-port 8080 tail: cannot open '/opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-1c345a00e312.out' for reading: No such file or directory full log in /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-1c345a00e312.out ``` 3. **spark as base image case is not supported well** Due to static USER set in Dockerfile. ``` $ cat Dockerfile FROM apache/spark RUN apt update $ docker build -t spark-test:1015 . // ... -- > [2/2] RUN apt update: #5 0.405 E: Could not open lock file /var/lib/apt/lists/lock - open (13: Permission denied) #5 0.405 E: Unable to lock directory /var/lib/apt/lists/ -- executor failed running [/bin/sh -c apt update]: exit code: 100 ``` ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? - CI passed: all k8s test - Regression test: ``` # Username is set to spark rather than 185 docker run -ti spark:scala2.12-java11-python3-r-ubuntu bash spark27bbfca0a581:/opt/spark/work-dir$ ``` ``` # start-master.sh no permission issue $ docker run -ti spark:scala2.12-java11-python3-r-ubuntu bash spark8d1118e26766:~/work-dir$ /opt/spark/sbin/start-master.sh starting org.apache.spark.deploy.master.Master, logging to /opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-8d1118e26766.out ``` ``` # Image as parent case $ cat Dockerfile FROM spark:scala2.12-java11-python3-r-ubuntu RUN apt update $ docker build -t spark-test:1
[spark-docker] branch master updated: [SPARK-40783][INFRA] Enable Spark on K8s integration test
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 3037f75 [SPARK-40783][INFRA] Enable Spark on K8s integration test 3037f75 is described below commit 3037f75a88ca7ea57746c7d1bf49c125a828f56e Author: Yikun Jiang AuthorDate: Fri Oct 14 11:57:01 2022 +0800 [SPARK-40783][INFRA] Enable Spark on K8s integration test ### What changes were proposed in this pull request? This patch enable the Spark on K8s integration test: - **scala2.12-java11-python3-ubuntu**: Run scala / PySpark basic test - **scala2.12-java11-ubuntu**: Run scala basic test - **scala2.12-java11-r-ubuntu**: Run scala / SparkR basic test - **scala2.12-java11-python3-r-ubuntu**: Run all K8s integration test Currently, we use the local registry as a bridge between build and test: https://user-images.githubusercontent.com/1736354/195758243-abfbea7f-05e9-4678-a3a5-cfd38cc1b8f5.png;> - Build: generate the image and push to local registry - Test: load to minikube docker, run K8s test using specific image Due to the multi-platform images cannot be exported with the `docker` export type, the local registry (push) is used here rather than local build (load). Compare to `ghcr` it reduces the network transmition and permission required. Also: - Upgrade `setup-qemu-action` to v2 - Upgrade `setup-buildx-action` to v2 - Remove ununsed `Image digest` step ### Why are the changes needed? To ensure the quality of official dockerfiles. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Closes #9 from Yikun/enable-k8s-it. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- .github/workflows/main.yml | 142 - 1 file changed, 129 insertions(+), 13 deletions(-) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 7972703..b47245b 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -41,6 +41,15 @@ on: jobs: main: runs-on: ubuntu-latest +# Due to the multi-platform images cannot be exported with the `docker` export type, +# https://github.com/docker/buildx/issues/59 +# So, the local registry (push) is used here rather than local build (load): +# https://github.com/docker/build-push-action/blob/master/docs/advanced/local-registry.md +services: + registry: +image: registry:2 +ports: + - 5000:5000 strategy: matrix: spark_version: @@ -55,29 +64,26 @@ jobs: uses: actions/checkout@v2 - name: Set up QEMU -uses: docker/setup-qemu-action@v1 +uses: docker/setup-qemu-action@v2 - name: Set up Docker Buildx -uses: docker/setup-buildx-action@v1 - - - name: Login to GHCR -uses: docker/login-action@v2 +uses: docker/setup-buildx-action@v2 with: - registry: ghcr.io - username: ${{ github.actor }} - password: ${{ secrets.GITHUB_TOKEN }} + # This required by local registry + driver-opts: network=host - name: Generate tags run: | TAG=scala${{ matrix.scala_version }}-java${{ matrix.java_version }}-${{ matrix.image_suffix }} REPO_OWNER=$(echo "${{ github.repository_owner }}" | tr '[:upper:]' '[:lower:]') - TEST_REPO=ghcr.io/$REPO_OWNER/spark-docker + TEST_REPO=localhost:5000/$REPO_OWNER/spark-docker IMAGE_NAME=spark IMAGE_PATH=${{ matrix.spark_version }}/$TAG UNIQUE_IMAGE_TAG=${{ matrix.spark_version }}-$TAG + IMAGE_URL=$TEST_REPO/$IMAGE_NAME:$UNIQUE_IMAGE_TAG - # Unique image tag in each version: scala2.12-java11-python3-ubuntu + # Unique image tag in each version: 3.3.0-scala2.12-java11-python3-ubuntu echo "UNIQUE_IMAGE_TAG=${UNIQUE_IMAGE_TAG}" >> $GITHUB_ENV # Test repo: ghcr.io/apache/spark-docker echo "TEST_REPO=${TEST_REPO}" >> $GITHUB_ENV @@ -85,6 +91,8 @@ jobs: echo "IMAGE_NAME=${IMAGE_NAME}" >> $GITHUB_ENV # Image dockerfile path: 3.3.0/scala2.12-java11-python3-ubuntu echo "IMAGE_PATH=${IMAGE_PATH}" >> $GITHUB_ENV + # Image URL: ghcr.io/apache/spark-docker/spark:3.3.0-scala2.12-java11-python3-ubuntu + echo "IMAGE_URL=${IMAGE_URL}" >> $GITHUB_ENV - name: Print Image tags run: | @@ -92,13 +100,121 @@ jobs: echo "TEST_REPO: "${TEST_REPO} echo "IMAGE_NAME: "${IMAGE_NAME} echo "IMAGE_PATH: "${IMAGE_PATH} +
[spark-docker] branch master updated: [SPARK-40754][DOCS] Add LICENSE and NOTICE
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new fc07aed [SPARK-40754][DOCS] Add LICENSE and NOTICE fc07aed is described below commit fc07aeda1f48eb2aae9a441dfe94ae95f697e222 Author: Yikun Jiang AuthorDate: Thu Oct 13 21:47:15 2022 +0800 [SPARK-40754][DOCS] Add LICENSE and NOTICE ### What changes were proposed in this pull request? This pach adds LICENSE and NOTICE: - LICENSE: https://www.apache.org/licenses/LICENSE-2.0.txt - NOTICE: https://github.com/apache/spark/blob/master/NOTICE ### Why are the changes needed? https://www.apache.org/licenses/LICENSE-2.0#apply See also: https://github.com/apache/spark-docker/pull/2#issuecomment-1274807917 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? No need Closes #6 from Yikun/SPARK-40754. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- LICENSE | 202 NOTICE | 6 ++ 2 files changed, 208 insertions(+) diff --git a/LICENSE b/LICENSE new file mode 100644 index 000..d645695 --- /dev/null +++ b/LICENSE @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 +http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on b
[spark-docker] branch master updated: [SPARK-40746][INFRA] Fix Dockerfile build workflow
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new c116698 [SPARK-40746][INFRA] Fix Dockerfile build workflow c116698 is described below commit c11669850c0c03212df6d5c84c01050e6c933076 Author: Yikun Jiang AuthorDate: Wed Oct 12 10:48:51 2022 +0800 [SPARK-40746][INFRA] Fix Dockerfile build workflow ### What changes were proposed in this pull request? This patch is to make the workflow work in apache repo: - Add `.github/workflows/build_3.3.0.yaml` and `3.3.0/**` to trigger paths - Change `apache/spark-docker:TAG` to `ghcr.io/apache/spark-docker/spark:TAG` - Remove the push, we only need to build locally to validate dockerfile, even in future K8s IT test we can also refactor to use minikube docker, it still can be local build. ### Why are the changes needed? To make the workflow works well in apache repo. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Closes: https://github.com/apache/spark-docker/pull/5 Closes #7 from Yikun/SPARK-40746. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- .github/workflows/build_3.3.0.yaml | 3 ++- .github/workflows/main.yml | 3 +-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/.github/workflows/build_3.3.0.yaml b/.github/workflows/build_3.3.0.yaml index 63b1ab3..7e7ce39 100644 --- a/.github/workflows/build_3.3.0.yaml +++ b/.github/workflows/build_3.3.0.yaml @@ -24,7 +24,8 @@ on: branches: - 'master' paths: - - '3.3.0/' + - '3.3.0/**' + - '.github/workflows/build_3.3.0.yaml' - '.github/workflows/main.yml' jobs: diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 90bd706..7972703 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -97,8 +97,7 @@ jobs: uses: docker/build-push-action@v2 with: context: ${{ env.IMAGE_PATH }} - push: true - tags: ${{ env.TEST_REPO }}:${{ env.UNIQUE_IMAGE_TAG }} + tags: ${{ env.TEST_REPO }}/${{ env.IMAGE_NAME }}:${{ env.UNIQUE_IMAGE_TAG }} platforms: linux/amd64,linux/arm64 - name: Image digest - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-docker] branch master updated: [SPARK-40757][INFRA] Add PULL_REQUEST_TEMPLATE for spark-docker
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new 30fd82f [SPARK-40757][INFRA] Add PULL_REQUEST_TEMPLATE for spark-docker 30fd82f is described below commit 30fd82f313c4ecd44f4181e6a4cf2e1d9463c628 Author: Yikun Jiang AuthorDate: Wed Oct 12 10:47:31 2022 +0800 [SPARK-40757][INFRA] Add PULL_REQUEST_TEMPLATE for spark-docker ### What changes were proposed in this pull request? Initialize with https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE and remove some unsued note ### Why are the changes needed? Add PULL_REQUEST_TEMPLATE for `spark-docker` ### Does this PR introduce _any_ user-facing change? No, dev only ### How was this patch tested? New PR after this merged Closes #8 from Yikun/SPARK-40757. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- .github/PULL_REQUEST_TEMPLATE | 41 + 1 file changed, 41 insertions(+) diff --git a/.github/PULL_REQUEST_TEMPLATE b/.github/PULL_REQUEST_TEMPLATE new file mode 100644 index 000..5268131 --- /dev/null +++ b/.github/PULL_REQUEST_TEMPLATE @@ -0,0 +1,41 @@ + + +### What changes were proposed in this pull request? + + + +### Why are the changes needed? + + + +### Does this PR introduce _any_ user-facing change? + + + +### How was this patch tested? + - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-docker] branch master updated: [SPARK-40516] Add Apache Spark 3.3.0 Dockerfile
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new e61aba1 [SPARK-40516] Add Apache Spark 3.3.0 Dockerfile e61aba1 is described below commit e61aba1ed4ca8e747f38cae5f6bd72a3a50f57cd Author: Yikun Jiang AuthorDate: Tue Oct 11 10:45:57 2022 +0800 [SPARK-40516] Add Apache Spark 3.3.0 Dockerfile ### What changes were proposed in this pull request? This patch adds Apache Spark 3.3.0 Dockerfile: - 3.3.0-scala2.12-java11-python3-ubuntu: pyspark + scala - 3.3.0-scala2.12-java11-ubuntu: scala - 3.3.0-scala2.12-java11-r-ubuntu: sparkr + scala - 3.3.0-scala2.12-java11-python3-r-ubuntu: All in one image ### Why are the changes needed? This is needed by Docker Official Image See also in: https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? **The action won't be triggered until the workflow is merged to the default branch**, so I can only test it in my local repo: - local test: https://github.com/Yikun/spark-docker/pull/1 ![image](https://user-images.githubusercontent.com/1736354/194975185-d5843c84-bbba-48d0-bbf0-363532c6712d.png) - Dockerfile E2E K8S Local test: https://github.com/Yikun/spark-docker-bak/pull/7 ![image](https://user-images.githubusercontent.com/1736354/194975267-6dca0de5-c715-4e0f-b735-22752b7912de.png) Closes #2 from Yikun/SPARK-40516. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- .github/workflows/build_3.3.0.yaml | 38 .github/workflows/main.yml | 105 3.3.0/scala2.12-java11-python3-r-ubuntu/Dockerfile | 84 .../entrypoint.sh | 107 + 3.3.0/scala2.12-java11-python3-ubuntu/Dockerfile | 81 .../scala2.12-java11-python3-ubuntu/entrypoint.sh | 107 + 3.3.0/scala2.12-java11-r-ubuntu/Dockerfile | 79 +++ 3.3.0/scala2.12-java11-r-ubuntu/entrypoint.sh | 107 + 3.3.0/scala2.12-java11-ubuntu/Dockerfile | 76 +++ 3.3.0/scala2.12-java11-ubuntu/entrypoint.sh| 107 + 10 files changed, 891 insertions(+) diff --git a/.github/workflows/build_3.3.0.yaml b/.github/workflows/build_3.3.0.yaml new file mode 100644 index 000..63b1ab3 --- /dev/null +++ b/.github/workflows/build_3.3.0.yaml @@ -0,0 +1,38 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +name: "Build and Test (3.3.0)" + +on: + pull_request: +branches: + - 'master' +paths: + - '3.3.0/' + - '.github/workflows/main.yml' + +jobs: + run-build: +name: Run +secrets: inherit +uses: ./.github/workflows/main.yml +with: + spark: 3.3.0 + scala: 2.12 + java: 11 diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml new file mode 100644 index 000..90bd706 --- /dev/null +++ b/.github/workflows/main.yml @@ -0,0 +1,105 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +name: Main (Build/Test/Publish) + +on: + workflow_cal
[spark-docker] branch master updated: [SPARK-40727][INFRA] Add merge_spark_docker_pr.py
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git The following commit(s) were added to refs/heads/master by this push: new fa2d1a5 [SPARK-40727][INFRA] Add merge_spark_docker_pr.py fa2d1a5 is described below commit fa2d1a59b6e47b1e4072154de0b1f215780af595 Author: Yikun Jiang AuthorDate: Mon Oct 10 20:34:43 2022 +0800 [SPARK-40727][INFRA] Add merge_spark_docker_pr.py ### What changes were proposed in this pull request? This patch add the merge_spark_docker_pr.py to help to merge `spark-docker` commits and resolve spark JIRA issue. The script is from https://github.com/apache/spark/blob/ef837ca71020950b841f9891c70dc4b29d968bf1/dev/merge_spark_pr.py And change `spark` to `spark-docker`: https://github.com/apache/spark-docker/commit/e4107a74d348656041612ff68a647c6051894240 ### Why are the changes needed? Help to merge spark-docker commits. ### Does this PR introduce _any_ user-facing change? No, dev only ### How was this patch tested? will merge it by using itself Closes #1 from Yikun/merge_script. Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- merge_spark_docker_pr.py | 571 +++ 1 file changed, 571 insertions(+) diff --git a/merge_spark_docker_pr.py b/merge_spark_docker_pr.py new file mode 100755 index 000..578a280 --- /dev/null +++ b/merge_spark_docker_pr.py @@ -0,0 +1,571 @@ +#!/usr/bin/env python3 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Utility for creating well-formed pull request merges and pushing them to Apache +# Spark. +# usage: ./merge_spark_docker_pr.py(see config env vars below) +# +# This utility assumes you already have a local Spark git folder and that you +# have added remotes corresponding to both (i) the github apache Spark +# mirror and (ii) the apache git repo. + +import json +import os +import re +import subprocess +import sys +import traceback +from urllib.request import urlopen +from urllib.request import Request +from urllib.error import HTTPError + +try: +import jira.client + +JIRA_IMPORTED = True +except ImportError: +JIRA_IMPORTED = False + +# Location of your Spark git development area +SPARK_DOCKER_HOME = os.environ.get("SPARK_DOCKER_HOME", os.getcwd()) +# Remote name which points to the Gihub site +PR_REMOTE_NAME = os.environ.get("PR_REMOTE_NAME", "apache-github") +# Remote name which points to Apache git +PUSH_REMOTE_NAME = os.environ.get("PUSH_REMOTE_NAME", "apache") +# ASF JIRA username +JIRA_USERNAME = os.environ.get("JIRA_USERNAME", "") +# ASF JIRA password +JIRA_PASSWORD = os.environ.get("JIRA_PASSWORD", "") +# OAuth key used for issuing requests against the GitHub API. If this is not defined, then requests +# will be unauthenticated. You should only need to configure this if you find yourself regularly +# exceeding your IP's unauthenticated request rate limit. You can create an OAuth key at +# https://github.com/settings/tokens. This script only requires the "public_repo" scope. +GITHUB_OAUTH_KEY = os.environ.get("GITHUB_OAUTH_KEY") + + +GITHUB_BASE = "https://github.com/apache/spark-docker/pull; +GITHUB_API_BASE = "https://api.github.com/repos/apache/spark-docker; +JIRA_BASE = "https://issues.apache.org/jira/browse; +JIRA_API_BASE = "https://issues.apache.org/jira; +# Prefix added to temporary branches +BRANCH_PREFIX = "PR_TOOL" + + +def get_json(url): +try: +request = Request(url) +if GITHUB_OAUTH_KEY: +request.add_header("Authorization", "token %s" % GITHUB_OAUTH_KEY) +return json.load(urlopen(request)) +except HTTPError as e: +if "X-RateLimit-Remaining" in e.headers and e.headers["X-RateLimit-Remaining"] == "0": +print( +"Exceeded the GitHub API rate limit; see the instructions in " +
[spark] branch master updated: [SPARK-40725][INFRA] Add `mypy-protobuf` to dev/requirements
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3fa958af326 [SPARK-40725][INFRA] Add `mypy-protobuf` to dev/requirements 3fa958af326 is described below commit 3fa958af326582d8638f36f90b91fe7045f396bf Author: Ruifeng Zheng AuthorDate: Mon Oct 10 17:30:12 2022 +0800 [SPARK-40725][INFRA] Add `mypy-protobuf` to dev/requirements ### What changes were proposed in this pull request? Add `mypy-protobuf` to dev/requirements ### Why are the changes needed? `connector/connect/dev/generate_protos.sh` requires this package: ``` DEBUG /buf.alpha.registry.v1alpha1.GenerateService/GeneratePlugins {"duration": "14.25µs", "http.path": "/buf.alpha.registry.v1alpha1.GenerateService/GeneratePlugins", "http.url": "https://api.buf.build/buf.alpha.registry.v1alpha1.GenerateService/GeneratePlugins;, "http.host": "api.buf.build", "http.method": "POST", "http.user_agent": "connect-go/0.4.0-dev (go1.19.2)"} DEBUG command {"duration": "9.238333ms"} Failure: plugin mypy: could not find protoc plugin for name mypy ``` ### Does this PR introduce _any_ user-facing change? No, only for contributors ### How was this patch tested? manually check Closes #38186 from zhengruifeng/add_mypy-protobuf_to_requirements. Authored-by: Ruifeng Zheng Signed-off-by: Yikun Jiang --- dev/requirements.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/dev/requirements.txt b/dev/requirements.txt index c610d84c11a..4b47c1f6e83 100644 --- a/dev/requirements.txt +++ b/dev/requirements.txt @@ -48,4 +48,5 @@ black==22.6.0 # Spark Connect grpcio==1.48.1 -protobuf==4.21.6 \ No newline at end of file +protobuf==4.21.6 +mypy-protobuf - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-docker] 01/01: [SPARK-40723][INFRA] Add .asf.yaml to spark-docker
This is an automated email from the ASF dual-hosted git repository. yikun pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git commit c5b015ac2014bfeb47daafd454b610ae9633f676 Author: Yikun Jiang AuthorDate: Mon Oct 10 16:40:34 2022 +0800 [SPARK-40723][INFRA] Add .asf.yaml to spark-docker ### What changes were proposed in this pull request? This change add the .asf.yaml as first commit. ### Why are the changes needed? Initialize the repo setting. ### Does this PR introduce _any_ user-facing change? No, dev only ### How was this patch tested? See result after merged Authored-by: Yikun Jiang Signed-off-by: Yikun Jiang --- .asf.yaml | 39 +++ 1 file changed, 39 insertions(+) diff --git a/.asf.yaml b/.asf.yaml new file mode 100644 index 000..cc7385f --- /dev/null +++ b/.asf.yaml @@ -0,0 +1,39 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# https://cwiki.apache.org/confluence/display/INFRA/git+-+.asf.yaml+features +--- +github: + description: "Official Dockerfile for Apache Spark" + homepage: https://spark.apache.org/ + labels: +- python +- scala +- r +- java +- big-data +- jdbc +- sql +- spark + enabled_merge_buttons: +merge: false +squash: true +rebase: true + +notifications: + pullrequests: revi...@spark.apache.org + issues: revi...@spark.apache.org + commits: commits@spark.apache.org + - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-docker] branch master created (now c5b015a)
This is an automated email from the ASF dual-hosted git repository. yikun pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark-docker.git at c5b015a [SPARK-40723][INFRA] Add .asf.yaml to spark-docker This branch includes the following new commits: new c5b015a [SPARK-40723][INFRA] Add .asf.yaml to spark-docker The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org