hudi-agent commented on code in PR #18687:
URL: https://github.com/apache/hudi/pull/18687#discussion_r3180232909
##########
docker/README.md:
##########
@@ -139,59 +149,17 @@ Status: running
Platforms: linux/amd64, linux/arm64, linux/arm/v7, linux/arm/v6
```
-Now goto `<HUDI_REPO_DIR>/docker/hoodie/hadoop` and change the `Dockerfile` to
pull dependent images corresponding to
-arm64. For example, in [base/Dockerfile](./hoodie/hadoop/base/Dockerfile)
(which pulls jdk11 image), change the
-line `FROM openjdk:11-jdk-slim-bullseye` to `FROM
arm64v8/openjdk:11-jdk-slim-bullseye`.
-
-Then, from under `<HUDI_REPO_DIR>/docker/hoodie/hadoop` directory, execute the
following command to build as well as
-push the image to the dockerhub repo:
+Then run the script from under `<HUDI_REPO_DIR>/docker`:
-```
-# Run under hoodie/hadoop, the <tag> is optional, "latest" by default
-docker buildx build <image_folder_name> --platform <comma-separated,platforms>
-t <hub-user>/<repo-name>[:<tag>] --push
-
-# For example, to build base image
-docker buildx build base --platform linux/arm64 -t
apachehudi/hudi-hadoop_2.8.4-base:linux-arm64-0.10.1 --push
-```
-
-Once the base image is pushed then you could do something similar for other
images.
-Change [hive](./hoodie/hadoop/hive_base/Dockerfile) dockerfile to pull the
base image with tag corresponding to
-linux/arm64 platform.
-
-```
-# Change below line in the Dockerfile
-FROM apachehudi/hudi-hadoop_${HADOOP_VERSION}-base:latest
-# as shown below
-FROM --platform=linux/arm64
apachehudi/hudi-hadoop_${HADOOP_VERSION}-base:linux-arm64-0.10.1
+```shell
+./build_docker_images.sh --multi-arch
-# and then build & push from under hoodie/hadoop dir
-docker buildx build hive_base --platform linux/arm64 -t
apachehudi/hudi-hadoop_2.8.4-hive_2.3.3:linux-arm64-0.10.1 --push
+# Example with explicit component versions
Review Comment:
🤖 It might help to call out that `--multi-arch` pushes to the hardcoded
`apachehudi/...` repositories on Docker Hub. As written ("Docker Hub push
permissions are required") a reader could reasonably assume any Docker Hub
account would work, but the script has no flag or env var to override the
registry/repo prefix, so only users with write access to the `apachehudi` org
can actually use this path. Could we either (a) state that this flow is
intended for Hudi committers/release managers with `apachehudi` push access, or
(b) document the workaround (edit the script to swap the prefix)? @yihua could
you confirm the intended audience for `--multi-arch`?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
docker/README.md:
##########
@@ -139,59 +149,17 @@ Status: running
Platforms: linux/amd64, linux/arm64, linux/arm/v7, linux/arm/v6
```
-Now goto `<HUDI_REPO_DIR>/docker/hoodie/hadoop` and change the `Dockerfile` to
pull dependent images corresponding to
-arm64. For example, in [base/Dockerfile](./hoodie/hadoop/base/Dockerfile)
(which pulls jdk11 image), change the
-line `FROM openjdk:11-jdk-slim-bullseye` to `FROM
arm64v8/openjdk:11-jdk-slim-bullseye`.
-
-Then, from under `<HUDI_REPO_DIR>/docker/hoodie/hadoop` directory, execute the
following command to build as well as
-push the image to the dockerhub repo:
+Then run the script from under `<HUDI_REPO_DIR>/docker`:
-```
-# Run under hoodie/hadoop, the <tag> is optional, "latest" by default
-docker buildx build <image_folder_name> --platform <comma-separated,platforms>
-t <hub-user>/<repo-name>[:<tag>] --push
-
-# For example, to build base image
-docker buildx build base --platform linux/arm64 -t
apachehudi/hudi-hadoop_2.8.4-base:linux-arm64-0.10.1 --push
-```
-
-Once the base image is pushed then you could do something similar for other
images.
-Change [hive](./hoodie/hadoop/hive_base/Dockerfile) dockerfile to pull the
base image with tag corresponding to
-linux/arm64 platform.
-
-```
-# Change below line in the Dockerfile
-FROM apachehudi/hudi-hadoop_${HADOOP_VERSION}-base:latest
-# as shown below
-FROM --platform=linux/arm64
apachehudi/hudi-hadoop_${HADOOP_VERSION}-base:linux-arm64-0.10.1
+```shell
+./build_docker_images.sh --multi-arch
-# and then build & push from under hoodie/hadoop dir
-docker buildx build hive_base --platform linux/arm64 -t
apachehudi/hudi-hadoop_2.8.4-hive_2.3.3:linux-arm64-0.10.1 --push
+# Example with explicit component versions
+./build_docker_images.sh --hadoop-version 3.4.0 --spark-version 4.0.1
--hive-version 3.1.3 --multi-arch
```
-Similarly, for images that are dependent on hive (e.g. [base
spark](./hoodie/hadoop/spark_base/Dockerfile)
-, [sparkmaster](./hoodie/hadoop/sparkmaster/Dockerfile),
[sparkworker](./hoodie/hadoop/sparkworker/Dockerfile)
-and [sparkadhoc](./hoodie/hadoop/sparkadhoc/Dockerfile)), change the
corresponding Dockerfile to pull the base hive
-image with tag corresponding to arm64. Then build and push using `docker
buildx` command.
-
-For the sake of completeness, here is a
[patch](https://gist.github.com/xushiyan/cec16585e884cf0693250631a1d10ec2) which
-shows what changes to make in Dockerfiles (assuming tag is named
`linux-arm64-0.10.1`), and below is the list
-of `docker buildx` commands.
-
-```
-docker buildx build base --platform linux/arm64 -t
apachehudi/hudi-hadoop_2.8.4-base:linux-arm64-0.10.1 --push
-docker buildx build datanode --platform linux/arm64 -t
apachehudi/hudi-hadoop_2.8.4-datanode:linux-arm64-0.10.1 --push
-docker buildx build historyserver --platform linux/arm64 -t
apachehudi/hudi-hadoop_2.8.4-history:linux-arm64-0.10.1 --push
-docker buildx build hive_base --platform linux/arm64 -t
apachehudi/hudi-hadoop_2.8.4-hive_2.3.3:linux-arm64-0.10.1 --push
-docker buildx build namenode --platform linux/arm64 -t
apachehudi/hudi-hadoop_2.8.4-namenode:linux-arm64-0.10.1 --push
-docker buildx build prestobase --platform linux/arm64 -t
apachehudi/hudi-hadoop_2.8.4-prestobase_0.217:linux-arm64-0.10.1 --push
-docker buildx build spark_base --platform linux/arm64 -t
apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkbase_2.4.4:linux-arm64-0.10.1
--push
-docker buildx build sparkadhoc --platform linux/arm64 -t
apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkadhoc_2.4.4:linux-arm64-0.10.1
--push
-docker buildx build sparkmaster --platform linux/arm64 -t
apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkmaster_2.4.4:linux-arm64-0.10.1
--push
-docker buildx build sparkworker --platform linux/arm64 -t
apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkworker_2.4.4:linux-arm64-0.10.1
--push
-```
+When `--multi-arch` is enabled, the script builds and pushes the amd64 and
arm64 variants in one pass. Use
+`--version-tag <tag>` to override the image tag used for the push.
Review Comment:
🤖 This says "No Dockerfile changes are needed for the amd64 plus arm64 build
flow above." That should be true as long as every `FROM` base image in
`docker/hoodie/hadoop/*/Dockerfile` has both linux/amd64 and linux/arm64
manifests on Docker Hub — which is no longer guaranteed for older bases (e.g.,
the previous `openjdk:11-jdk-slim-bullseye` did publish arm64, but some older
tagged images do not). Could we confirm the current base images all ship
multi-arch manifests, or otherwise note this as a prerequisite?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
docker/README.md:
##########
@@ -43,6 +43,21 @@ To build all docker images locally, you can run the script:
./build_local_docker_images.sh
```
+To build the Docker demo images directly with the script, run it from under
`<HUDI_REPO_DIR>/docker`:
+
+```shell
+./build_docker_images.sh
+```
+
+You can override the Hadoop, Spark, and Hive versions from the command line:
+
+```shell
Review Comment:
🤖 The example uses `--hadoop-version 3.4.0 --spark-version 4.0.1
--hive-version 3.1.3`, but the script defaults are Hadoop 2.8.4 / Spark 3.5.3 /
Hive 2.3.10 and there's no validation of supported combinations. Have these
specific versions been verified to build successfully end-to-end (Dockerfile
downloads, Java base image selection, demo compose files, etc.)? If not, it
might be safer to use a combination known to work, or add a note that arbitrary
version combinations are not guaranteed to be supported.
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]