This is an automated email from the ASF dual-hosted git repository.
kerwinzhang pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-gluten.git
The following commit(s) were added to refs/heads/main by this push:
new c5ce55228 [CELEBORN][DOC] Fix Celeborn support of get-started document
(#6282)
c5ce55228 is described below
commit c5ce552280976eba884bcae36bb02b8afc355017
Author: Nicholas Jiang <[email protected]>
AuthorDate: Mon Jul 1 14:49:03 2024 +0700
[CELEBORN][DOC] Fix Celeborn support of get-started document (#6282)
---
docs/get-started/ClickHouse.md | 33 +++++++++++----------------------
docs/get-started/Velox.md | 4 ++--
docs/get-started/build-guide.md | 23 ++++++++++++-----------
3 files changed, 25 insertions(+), 35 deletions(-)
diff --git a/docs/get-started/ClickHouse.md b/docs/get-started/ClickHouse.md
index ab24de7a4..38ce048fe 100644
--- a/docs/get-started/ClickHouse.md
+++ b/docs/get-started/ClickHouse.md
@@ -629,19 +629,26 @@ public read-only account:gluten/hN2xX3uQ4m
### Celeborn support
-Gluten with clickhouse backend has not yet supportted
[Celeborn](https://github.com/apache/celeborn) natively as remote shuffle
service using columar shuffle. However, you can still use Celeborn with row
shuffle, which means a ColumarBatch will be converted to a row during shuffle.
-Below introduction is used to enable this feature:
+Gluten with clickhouse backend supports
[Celeborn](https://github.com/apache/celeborn) as remote shuffle service.
Currently, the supported Celeborn versions are `0.3.x` and `0.4.0`.
+
+Below introduction is used to enable this feature.
First refer to this URL(https://github.com/apache/celeborn) to setup a
celeborn cluster.
+When compiling the Gluten Java module, it's required to enable `celeborn`
profile, as follows:
+
+```
+mvn clean package -Pbackends-clickhouse -Pspark-3.3 -Pceleborn -DskipTests
+```
+
Then add the Spark Celeborn Client packages to your Spark application's
classpath(usually add them into `$SPARK_HOME/jars`).
- Celeborn: celeborn-client-spark-3-shaded_2.12-[celebornVersion].jar
-Currently to use Celeborn following configurations are required in
`spark-defaults.conf`
+Currently to use Gluten following configurations are required in
`spark-defaults.conf`
```
-spark.shuffle.manager org.apache.spark.shuffle.celeborn.SparkShuffleManager
+spark.shuffle.manager
org.apache.spark.shuffle.gluten.celeborn.CelebornShuffleManager
# celeborn master
spark.celeborn.master.endpoints clb-master:9097
@@ -670,24 +677,6 @@ spark.celeborn.storage.hdfs.dir hdfs://<namenode>/celeborn
spark.dynamicAllocation.enabled false
```
-#### Celeborn Columnar Shuffle Support
-Currently, the supported Celeborn versions are `0.3.x` and `0.4.0`.
-The native Celeborn support can be enabled by the following configuration
-```
-spark.shuffle.manager=org.apache.spark.shuffle.gluten.celeborn.CelebornShuffleManager
-```
-
-quickly start a celeborn cluster
-```shell
-wget
https://archive.apache.org/dist/celeborn/celeborn-0.3.2-incubating/apache-celeborn-0.3.2-incubating-bin.tgz
&& \
-tar -zxvf apache-celeborn-0.3.2-incubating-bin.tgz && \
-mv apache-celeborn-0.3.2-incubating-bin/conf/celeborn-defaults.conf.template
apache-celeborn-0.3.2-incubating-bin/conf/celeborn-defaults.conf && \
-mv apache-celeborn-0.3.2-incubating-bin/conf/log4j2.xml.template
apache-celeborn-0.3.2-incubating-bin/conf/log4j2.xml && \
-mkdir /opt/hadoop && chmod 777 /opt/hadoop && \
-echo -e "celeborn.worker.flusher.threads 4\nceleborn.worker.storage.dirs
/tmp\nceleborn.worker.monitor.disk.enabled false" >
apache-celeborn-0.3.2-incubating-bin/conf/celeborn-defaults.conf && \
-bash apache-celeborn-0.3.2-incubating-bin/sbin/start-master.sh && bash
apache-celeborn-0.3.2-incubating-bin/sbin/start-worker.sh
-```
-
### Columnar shuffle mode
We have two modes of columnar shuffle
1. prefer cache
diff --git a/docs/get-started/Velox.md b/docs/get-started/Velox.md
index d65b94fc1..5f9ae2a46 100644
--- a/docs/get-started/Velox.md
+++ b/docs/get-started/Velox.md
@@ -224,11 +224,11 @@ Currently there are several ways to asscess S3 in Spark.
Please refer [Velox S3]
Gluten with velox backend supports
[Celeborn](https://github.com/apache/celeborn) as remote shuffle service.
Currently, the supported Celeborn versions are `0.3.x` and `0.4.0`.
-Below introduction is used to enable this feature
+Below introduction is used to enable this feature.
First refer to this URL(https://github.com/apache/celeborn) to setup a
celeborn cluster.
-When compiling the Gluten Java module, it's required to enable `rss` profile,
as follows:
+When compiling the Gluten Java module, it's required to enable `celeborn`
profile, as follows:
```
mvn clean package -Pbackends-velox -Pspark-3.3 -Pceleborn -DskipTests
diff --git a/docs/get-started/build-guide.md b/docs/get-started/build-guide.md
index b2e4b9560..dc4989bc8 100644
--- a/docs/get-started/build-guide.md
+++ b/docs/get-started/build-guide.md
@@ -55,17 +55,18 @@ Please set them via `--`, e.g., `--velox_home=/YOUR/PATH`.
### Maven build parameters
The below parameters can be set via `-P` for mvn.
-| Parameters | Description
| Default state |
-|---------------------|------------------------------------------------------------------------------|---------------|
-| backends-velox | Build Gluten Velox backend.
| disabled |
-| backends-clickhouse | Build Gluten ClickHouse backend.
| disabled |
-| rss | Build Gluten with Remote Shuffle Service, only
applicable for Velox backend. | disabled |
-| delta | Build Gluten with Delta Lake support.
| disabled |
-| iceberg | Build Gluten with Iceberg support.
| disabled |
-| spark-3.2 | Build Gluten for Spark 3.2.
| enabled |
-| spark-3.3 | Build Gluten for Spark 3.3.
| disabled |
-| spark-3.4 | Build Gluten for Spark 3.4.
| disabled |
-| spark-3.5 | Build Gluten for Spark 3.5.
| disabled |
+| Parameters | Description | Default state |
+|---------------------|---------------------------------------|---------------|
+| backends-velox | Build Gluten Velox backend. | disabled |
+| backends-clickhouse | Build Gluten ClickHouse backend. | disabled |
+| celeborn | Build Gluten with Celeborn. | disabled |
+| uniffle | Build Gluten with Uniffle. | disabled |
+| delta | Build Gluten with Delta Lake support. | disabled |
+| iceberg | Build Gluten with Iceberg support. | disabled |
+| spark-3.2 | Build Gluten for Spark 3.2. | enabled |
+| spark-3.3 | Build Gluten for Spark 3.3. | disabled |
+| spark-3.4 | Build Gluten for Spark 3.4. | disabled |
+| spark-3.5 | Build Gluten for Spark 3.5. | disabled |
## Gluten Jar for Deployment
The gluten jar built out is under `GLUTEN_SRC/package/target/`.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]