hudi-agent commented on code in PR #18687:
URL: https://github.com/apache/hudi/pull/18687#discussion_r3180232909


##########
docker/README.md:
##########
@@ -139,59 +149,17 @@ Status:    running
 Platforms: linux/amd64, linux/arm64, linux/arm/v7, linux/arm/v6
 ```
 
-Now goto `<HUDI_REPO_DIR>/docker/hoodie/hadoop` and change the `Dockerfile` to 
pull dependent images corresponding to
-arm64. For example, in [base/Dockerfile](./hoodie/hadoop/base/Dockerfile) 
(which pulls jdk11 image), change the
-line `FROM openjdk:11-jdk-slim-bullseye` to `FROM 
arm64v8/openjdk:11-jdk-slim-bullseye`.
-
-Then, from under `<HUDI_REPO_DIR>/docker/hoodie/hadoop` directory, execute the 
following command to build as well as
-push the image to the dockerhub repo:
+Then run the script from under `<HUDI_REPO_DIR>/docker`:
 
-```
-# Run under hoodie/hadoop, the <tag> is optional, "latest" by default
-docker buildx build <image_folder_name> --platform <comma-separated,platforms> 
-t <hub-user>/<repo-name>[:<tag>] --push
-
-# For example, to build base image
-docker buildx build base --platform linux/arm64 -t 
apachehudi/hudi-hadoop_2.8.4-base:linux-arm64-0.10.1 --push
-```
-
-Once the base image is pushed then you could do something similar for other 
images.
-Change [hive](./hoodie/hadoop/hive_base/Dockerfile) dockerfile to pull the 
base image with tag corresponding to
-linux/arm64 platform.
-
-```
-# Change below line in the Dockerfile
-FROM apachehudi/hudi-hadoop_${HADOOP_VERSION}-base:latest
-# as shown below
-FROM --platform=linux/arm64 
apachehudi/hudi-hadoop_${HADOOP_VERSION}-base:linux-arm64-0.10.1
+```shell
+./build_docker_images.sh --multi-arch
 
-# and then build & push from under hoodie/hadoop dir
-docker buildx build hive_base --platform linux/arm64 -t 
apachehudi/hudi-hadoop_2.8.4-hive_2.3.3:linux-arm64-0.10.1 --push
+# Example with explicit component versions

Review Comment:
   🤖 It might help to call out that `--multi-arch` pushes to the hardcoded 
`apachehudi/...` repositories on Docker Hub. As written ("Docker Hub push 
permissions are required") a reader could reasonably assume any Docker Hub 
account would work, but the script has no flag or env var to override the 
registry/repo prefix, so only users with write access to the `apachehudi` org 
can actually use this path. Could we either (a) state that this flow is 
intended for Hudi committers/release managers with `apachehudi` push access, or 
(b) document the workaround (edit the script to swap the prefix)? @yihua could 
you confirm the intended audience for `--multi-arch`?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
docker/README.md:
##########
@@ -139,59 +149,17 @@ Status:    running
 Platforms: linux/amd64, linux/arm64, linux/arm/v7, linux/arm/v6
 ```
 
-Now goto `<HUDI_REPO_DIR>/docker/hoodie/hadoop` and change the `Dockerfile` to 
pull dependent images corresponding to
-arm64. For example, in [base/Dockerfile](./hoodie/hadoop/base/Dockerfile) 
(which pulls jdk11 image), change the
-line `FROM openjdk:11-jdk-slim-bullseye` to `FROM 
arm64v8/openjdk:11-jdk-slim-bullseye`.
-
-Then, from under `<HUDI_REPO_DIR>/docker/hoodie/hadoop` directory, execute the 
following command to build as well as
-push the image to the dockerhub repo:
+Then run the script from under `<HUDI_REPO_DIR>/docker`:
 
-```
-# Run under hoodie/hadoop, the <tag> is optional, "latest" by default
-docker buildx build <image_folder_name> --platform <comma-separated,platforms> 
-t <hub-user>/<repo-name>[:<tag>] --push
-
-# For example, to build base image
-docker buildx build base --platform linux/arm64 -t 
apachehudi/hudi-hadoop_2.8.4-base:linux-arm64-0.10.1 --push
-```
-
-Once the base image is pushed then you could do something similar for other 
images.
-Change [hive](./hoodie/hadoop/hive_base/Dockerfile) dockerfile to pull the 
base image with tag corresponding to
-linux/arm64 platform.
-
-```
-# Change below line in the Dockerfile
-FROM apachehudi/hudi-hadoop_${HADOOP_VERSION}-base:latest
-# as shown below
-FROM --platform=linux/arm64 
apachehudi/hudi-hadoop_${HADOOP_VERSION}-base:linux-arm64-0.10.1
+```shell
+./build_docker_images.sh --multi-arch
 
-# and then build & push from under hoodie/hadoop dir
-docker buildx build hive_base --platform linux/arm64 -t 
apachehudi/hudi-hadoop_2.8.4-hive_2.3.3:linux-arm64-0.10.1 --push
+# Example with explicit component versions
+./build_docker_images.sh --hadoop-version 3.4.0 --spark-version 4.0.1 
--hive-version 3.1.3 --multi-arch
 ```
 
-Similarly, for images that are dependent on hive (e.g. [base 
spark](./hoodie/hadoop/spark_base/Dockerfile)
-, [sparkmaster](./hoodie/hadoop/sparkmaster/Dockerfile), 
[sparkworker](./hoodie/hadoop/sparkworker/Dockerfile)
-and [sparkadhoc](./hoodie/hadoop/sparkadhoc/Dockerfile)), change the 
corresponding Dockerfile to pull the base hive
-image with tag corresponding to arm64. Then build and push using `docker 
buildx` command.
-
-For the sake of completeness, here is a 
[patch](https://gist.github.com/xushiyan/cec16585e884cf0693250631a1d10ec2) which
-shows what changes to make in Dockerfiles (assuming tag is named 
`linux-arm64-0.10.1`), and below is the list
-of `docker buildx` commands.
-
-```
-docker buildx build base --platform linux/arm64 -t 
apachehudi/hudi-hadoop_2.8.4-base:linux-arm64-0.10.1 --push
-docker buildx build datanode --platform linux/arm64 -t 
apachehudi/hudi-hadoop_2.8.4-datanode:linux-arm64-0.10.1 --push
-docker buildx build historyserver --platform linux/arm64 -t 
apachehudi/hudi-hadoop_2.8.4-history:linux-arm64-0.10.1 --push
-docker buildx build hive_base --platform linux/arm64 -t 
apachehudi/hudi-hadoop_2.8.4-hive_2.3.3:linux-arm64-0.10.1 --push
-docker buildx build namenode --platform linux/arm64 -t 
apachehudi/hudi-hadoop_2.8.4-namenode:linux-arm64-0.10.1 --push
-docker buildx build prestobase --platform linux/arm64 -t 
apachehudi/hudi-hadoop_2.8.4-prestobase_0.217:linux-arm64-0.10.1 --push
-docker buildx build spark_base --platform linux/arm64 -t 
apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkbase_2.4.4:linux-arm64-0.10.1 
--push
-docker buildx build sparkadhoc --platform linux/arm64 -t 
apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkadhoc_2.4.4:linux-arm64-0.10.1 
--push
-docker buildx build sparkmaster --platform linux/arm64 -t 
apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkmaster_2.4.4:linux-arm64-0.10.1 
--push
-docker buildx build sparkworker --platform linux/arm64 -t 
apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkworker_2.4.4:linux-arm64-0.10.1 
--push
-```
+When `--multi-arch` is enabled, the script builds and pushes the amd64 and 
arm64 variants in one pass. Use
+`--version-tag <tag>` to override the image tag used for the push.

Review Comment:
   🤖 This says "No Dockerfile changes are needed for the amd64 plus arm64 build 
flow above." That should be true as long as every `FROM` base image in 
`docker/hoodie/hadoop/*/Dockerfile` has both linux/amd64 and linux/arm64 
manifests on Docker Hub — which is no longer guaranteed for older bases (e.g., 
the previous `openjdk:11-jdk-slim-bullseye` did publish arm64, but some older 
tagged images do not). Could we confirm the current base images all ship 
multi-arch manifests, or otherwise note this as a prerequisite?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
docker/README.md:
##########
@@ -43,6 +43,21 @@ To build all docker images locally, you can run the script:
 ./build_local_docker_images.sh
 ```
 
+To build the Docker demo images directly with the script, run it from under 
`<HUDI_REPO_DIR>/docker`:
+
+```shell
+./build_docker_images.sh
+```
+
+You can override the Hadoop, Spark, and Hive versions from the command line:
+
+```shell

Review Comment:
   🤖 The example uses `--hadoop-version 3.4.0 --spark-version 4.0.1 
--hive-version 3.1.3`, but the script defaults are Hadoop 2.8.4 / Spark 3.5.3 / 
Hive 2.3.10 and there's no validation of supported combinations. Have these 
specific versions been verified to build successfully end-to-end (Dockerfile 
downloads, Java base image selection, demo compose files, etc.)? If not, it 
might be safer to use a combination known to work, or add a note that arbitrary 
version combinations are not guaranteed to be supported.
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to