This is an automated email from the ASF dual-hosted git repository.
sunchao pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion-comet.git
The following commit(s) were added to refs/heads/main by this push:
new 9ab6c75 feat: Document the class path / classloader issue with the
shuffle manager (#256)
9ab6c75 is described below
commit 9ab6c75f41456234f2fb93fcec15ff3cd435f49e
Author: Holden Karau <[email protected]>
AuthorDate: Sat Apr 13 09:16:34 2024 -0700
feat: Document the class path / classloader issue with the shuffle manager
(#256)
---
README.md | 8 ++++++++
.../apache/spark/shuffle/sort/CometShuffleExternalSorter.java | 9 ++++++++-
2 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 3b903b1..121972c 100644
--- a/README.md
+++ b/README.md
@@ -127,6 +127,14 @@ Comet shuffle feature is disabled by default. To enable
it, please add related c
Above configs enable Comet native shuffle which only supports hash partiting
and single partition.
Comet native shuffle doesn't support complext types yet.
+Comet doesn't have official release yet so currently the only way to test it
is to build jar and include it in your Spark application. Depending on your
deployment mode you may also need to set the driver & executor class path(s) to
explicitly contain Comet otherwise Spark may use a different class-loader for
the Comet components than its internal components which will then fail at
runtime. For example:
+
+```
+--driver-class-path spark/target/comet-spark-spark3.4_2.12-0.1.0-SNAPSHOT.jar
+```
+
+Some cluster managers may require additional configuration, see
https://spark.apache.org/docs/latest/cluster-overview.html
+
To enable columnar shuffle which supports all partitioning and basic complex
types, one more config is required:
```
--conf spark.comet.columnar.shuffle.enabled=true
diff --git
a/spark/src/main/java/org/apache/spark/shuffle/sort/CometShuffleExternalSorter.java
b/spark/src/main/java/org/apache/spark/shuffle/sort/CometShuffleExternalSorter.java
index 9fe88ec..4417c4f 100644
---
a/spark/src/main/java/org/apache/spark/shuffle/sort/CometShuffleExternalSorter.java
+++
b/spark/src/main/java/org/apache/spark/shuffle/sort/CometShuffleExternalSorter.java
@@ -431,7 +431,14 @@ public final class CometShuffleExternalSorter implements
CometShuffleChecksumSup
// As we cannot access the address of the internal array in the sorter,
so we need to
// allocate the array manually and expand the pointer array in the
sorter.
// We don't want in-memory sorter to allocate memory but the initial
size cannot be zero.
- this.inMemSorter = new ShuffleInMemorySorter(allocator, 1, true);
+ try {
+ this.inMemSorter = new ShuffleInMemorySorter(allocator, 1, true);
+ } catch (java.lang.IllegalAccessError e) {
+ throw new java.lang.RuntimeException(
+ "Error loading in-memory sorter check class path -- see "
+ +
"https://github.com/apache/arrow-datafusion-comet?tab=readme-ov-file#enable-comet-shuffle",
+ e);
+ }
sorterArray = allocator.allocateArray(initialSize);
this.inMemSorter.expandPointerArray(sorterArray);