This is an automated email from the ASF dual-hosted git repository.

felixybw pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-gluten.git


The following commit(s) were added to refs/heads/main by this push:
     new f1d664bc39 [VL] Update document of build gluten in Docker (#8459)
f1d664bc39 is described below

commit f1d664bc397cc049933060487504bc1c6979a0ba
Author: BInwei Yang <[email protected]>
AuthorDate: Wed Jan 8 18:29:15 2025 -0800

    [VL] Update document of build gluten in Docker (#8459)
    
    Add the details to build Gluten in docker.
---
 docs/developers/velox-backend-build-in-docker.md | 67 ++++++++++++++++++++----
 docs/get-started/Velox.md                        | 18 ++++---
 2 files changed, 67 insertions(+), 18 deletions(-)

diff --git a/docs/developers/velox-backend-build-in-docker.md 
b/docs/developers/velox-backend-build-in-docker.md
index 4820c7cdc7..4d5a32767f 100755
--- a/docs/developers/velox-backend-build-in-docker.md
+++ b/docs/developers/velox-backend-build-in-docker.md
@@ -5,17 +5,64 @@ nav_order: 7
 parent: Developer Overview
 ---
 
-Currently, Centos-7/8/9 and Ubuntu 20.04/22.04 are supported to build Gluten 
Velox backend. Please refer to
-`.github/workflows/velox_weekly.yml` to install required tools before the 
build.
+Currently, we have two way to build Gluten, static link or dynamic link. 
 
-There are two docker images with almost all dependencies installed, respective 
for static build and dynamic build.
-The according Dockerfiles are respectively `Dockerfile.centos7-static-build` 
and `Dockerfile.centos8-dynamic-build`
-under `dev/docker/`.
+# Static link
+The static link approach builds all dependency libraries in vcpkg for both 
Velox and Gluten. It then statically links these libraries into libvelox.so and 
libgluten.so, enabling the build of Gluten on *any* Linux OS on x86 platforms 
with 64G memory. However we only verified on Centos-7/8/9 and Ubuntu 
20.04/22.04. Please submit an issue if it fails on your OS.
 
-```shell
-# For static build on centos-7.
-docker pull apache/gluten:vcpkg-centos-7
+Here is the dependency libraries required on target system, they are the 
essential libraries pre-installed in every Linux OS.
+```
+linux-vdso.so.1
+librt.so.1
+libpthread.so.0
+libdl.so.2
+libm.so.6
+libc.so.6
+/lib64/ld-linux-x86-64.so.2
+```
+
+The 'dockerfile' to build Gluten jar:
+
+```
+FROM apache/gluten:vcpkg-centos-7
 
-# For dynamic build on centos-8.
-docker pull apache/gluten:centos-8 (dynamic build)
+# Build Gluten Jar
+RUN source /opt/rh/devtoolset-11/enable && \
+    git clone https://github.com/apache/incubator-gluten.git && \
+    cd incubator-gluten && \
+    ./dev/builddeps-veloxbe.sh --run_setup_script=OFF --enable_s3=ON 
--enable_gcs=ON --enable_abfs=ON --enable_vcpkg=ON --build_arrow=OFF && \
+    mvn clean package -Pbackends-velox -Pceleborn -Piceberg -Pdelta 
-Pspark-3.4 -DskipTests
+```
+`enable_vcpkg=ON` enables the static link. Vcpkg packages are already 
pre-installed in the vcpkg-centos-7 image and can be reused automatically. The 
image is maintained by Gluten community.
+
+The command builds Gluten jar in 'glutenimage':
+```
+docker build -t glutenimage -f dockerfile
+```
+The gluten jar can be copied from 
glutenimage:/incubator-gluten/package/target/gluten-velox-bundle-*.jar
+
+# Dynamic link
+The dynamic link approach needs to install the dependencies libraries. It then 
dynamically link the .so files into libvelox.so and libgluten.so. Currently, 
Centos-7/8/9 and
+ Ubuntu 20.04/22.04 are supported to build Gluten Velox backend dynamically. 
+
+The 'dockerfile' to build Gluten jar:
+
+```
+FROM apache/gluten:centos-8
+
+# Build Gluten Jar
+RUN source /opt/rh/devtoolset-11/enable && \
+    git clone https://github.com/apache/incubator-gluten.git && \
+    cd incubator-gluten && \
+    ./dev/builddeps-veloxbe.sh --run_setup_script=ON --enable_hdfs=ON 
--enable_vcpkg=OFF --build_arrow=OFF && \
+    mvn clean package -Pbackends-velox -Pceleborn -Piceberg -Pdelta 
-Pspark-3.4 -DskipTests && \
+    ./dev/build-thirdparty.sh
+```
+`enable_vcpkg=OFF` enables the dynamic link. Part of shared libraries are 
pre-installed in the image. You need to specify `--run_setup_script=ON` to 
install the rest of them. It then packages all dependency libraries into a jar 
by `build-thirdparty.sh`. 
+Please note the image is built based on centos-8. It has risk to build and 
deploy the jar on other OSes.
+
+The command builds Gluten jar in 'glutenimage':
+```
+docker build -t glutenimage -f dockerfile
 ```
+The gluten jar can be copied from 
glutenimage:/incubator-gluten/package/target/gluten-velox-bundle-*.jar and 
glutenimage:/incubator-gluten/package/target/gluten-thirdparty-lib-*.jar
diff --git a/docs/get-started/Velox.md b/docs/get-started/Velox.md
index 48bca9a6d3..863e996796 100644
--- a/docs/get-started/Velox.md
+++ b/docs/get-started/Velox.md
@@ -16,8 +16,7 @@ parent: Getting-Started
 
 # Prerequisite
 
-Currently, Gluten+Velox backend is only tested on 
**Ubuntu20.04/Ubuntu22.04/Centos7/Centos8**.
-Other kinds of OS support are still in progress. The long term goal is to 
support several common OS and conda env deployment.
+Currently, with static build Gluten+Velox backend supports all the Linux OSes, 
but is only tested on **Ubuntu20.04/Ubuntu22.04/Centos7/Centos8**. With dynamic 
build, Gluten+Velox backend support **Ubuntu20.04/Ubuntu22.04/Centos7/Centos8** 
and their variants.
 
 Currently, the officially supported Spark versions are 3.2.2, 3.3.1, 3.4.3 and 
3.5.1.
 
@@ -103,20 +102,23 @@ mvn clean package -Pbackends-velox -Pceleborn -Puniffle 
-Pspark-3.4 -DskipTests
 mvn clean package -Pbackends-velox -Pceleborn -Puniffle -Pspark-3.5 -DskipTests
 ```
 
-Notes: Building Velox may fail caused by OOM. You can prevent this failure by 
adjusting `NUM_THREADS` (e.g., `export NUM_THREADS=4`) before building 
Gluten/Velox.
+Notes: Building Velox may fail caused by OOM. You can prevent this failure by 
adjusting `NUM_THREADS` (e.g., `export NUM_THREADS=4`) before building 
Gluten/Velox. The recommended minimal memory size is 64G.
 
 After the above build process, the Jar file will be generated under 
`package/target/`.
 
+Alternatively you may refer to [build in 
docker](docs/developers/velox-backend-build-in-docker.md) to build the gluten 
jar in docker.
+
 ## Dependency library deployment
 
 With build option `enable_vcpkg=ON`, all dependency libraries will be 
statically linked to `libvelox.so` and `libgluten.so` which are packed into the 
gluten-jar.
 In this way, only the gluten-jar is needed to add to 
`spark.<driver|executor>.extraClassPath` and spark will deploy the jar to each 
worker node. It's better to build
-the static version using a clean docker image without any extra libraries 
installed. On host with some libraries like jemalloc installed, the script may 
crash with
-odd message. You may need to uninstall those libraries to get a clean host. We 
strongly recommend user to build Gluten in this way to avoid dependency lacking 
issue.
+the static version using a clean docker image without any extra libraries 
installed ( [build in docker](docs/developers/velox-backend-build-in-docker.md) 
). On host with
+some libraries like jemalloc installed, the script may crash with odd message. 
You may need to uninstall those libraries to get a clean host. We ** strongly 
recommend ** user to build Gluten in this way to avoid dependency lacking issue.
 
-With build option `enable_vcpkg=OFF`, not all dependency libraries will be 
statically linked. You need to separately execute `./dev/build-thirdparty.sh` 
to pack required
-shared libraries into another jar named 
`gluten-thirdparty-lib-$LINUX_OS-$VERSION-$ARCH.jar`. Then you need to add the 
jar to Spark config `extraClassPath` and set
-`spark.gluten.loadLibFromJar=true`. Otherwise, you need to install required 
shared libraries on each worker node. You may find the libraries list from the 
third-party jar.
+With build option `enable_vcpkg=OFF`, not all dependency libraries will be 
dynamically linked. After building, you need to separately execute 
`./dev/build-thirdparty.sh` to 
+pack required shared libraries into another jar named 
`gluten-thirdparty-lib-$LINUX_OS-$VERSION-$ARCH.jar`. Then you need to add the 
jar to Spark config `extraClassPath` and 
+set `spark.gluten.loadLibFromJar=true`. Otherwise, you need to install 
required shared libraries with ** exactly the same versions ** on each worker 
node . You may find the 
+libraries list from the third-party jar.
 
 ## HDFS support
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to