This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 36dce360 Publish built docs triggered by
507e475bd06fbb9cc738d61473a8341caa6daf74
36dce360 is described below
commit 36dce3602959d63efef9625535b6b84c2b1c5067
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Thu May 23 21:08:11 2024 +0000
Publish built docs triggered by 507e475bd06fbb9cc738d61473a8341caa6daf74
---
_sources/user-guide/configs.md.txt | 2 +-
_sources/user-guide/tuning.md.txt | 22 +++++++++++---------
searchindex.js | 2 +-
user-guide/configs.html | 26 ++++++++++++------------
user-guide/tuning.html | 41 ++++++++++++++++++++++++++++----------
5 files changed, 59 insertions(+), 34 deletions(-)
diff --git a/_sources/user-guide/configs.md.txt
b/_sources/user-guide/configs.md.txt
index 0204b0c5..eb349b34 100644
--- a/_sources/user-guide/configs.md.txt
+++ b/_sources/user-guide/configs.md.txt
@@ -29,7 +29,6 @@ Comet provides the following configuration settings.
| spark.comet.columnar.shuffle.async.enabled | Whether to enable asynchronous
shuffle for Arrow-based shuffle. By default, this config is false. | false |
| spark.comet.columnar.shuffle.async.max.thread.num | Maximum number of
threads on an executor used for Comet async columnar shuffle. By default, this
config is 100. This is the upper bound of total number of shuffle threads per
executor. In other words, if the number of cores * the number of shuffle
threads per task `spark.comet.columnar.shuffle.async.thread.num` is larger than
this config. Comet will use this config as the number of shuffle threads per
executor instead. | 100 |
| spark.comet.columnar.shuffle.async.thread.num | Number of threads used for
Comet async columnar shuffle per shuffle task. By default, this config is 3.
Note that more threads means more memory requirement to buffer shuffle data
before flushing to disk. Also, more threads may not always improve performance,
and should be set based on the number of cores available. | 3 |
-| spark.comet.columnar.shuffle.enabled | Whether to enable Arrow-based
columnar shuffle for Comet and Spark regular operators. If this is enabled,
Comet prefers columnar shuffle than native shuffle. By default, this config is
true. | true |
| spark.comet.columnar.shuffle.memory.factor | Fraction of Comet memory to be
allocated per executor process for Comet shuffle. Comet memory size is
specified by `spark.comet.memoryOverhead` or calculated by
`spark.comet.memory.overhead.factor` * `spark.executor.memory`. By default,
this config is 1.0. | 1.0 |
| spark.comet.debug.enabled | Whether to enable debug mode for Comet. By
default, this config is false. When enabled, Comet will do additional checks
for debugging purpose. For example, validating array when importing arrays from
JVM at native side. Note that these checks may be expensive in performance and
should only be enabled for debugging purpose. | false |
| spark.comet.enabled | Whether to enable Comet extension for Spark. When this
is turned on, Spark will use Comet to read Parquet data source. Note that to
enable native vectorized execution, both this config and
'spark.comet.exec.enabled' need to be enabled. By default, this config is the
value of the env var `ENABLE_COMET` if set, or true otherwise. | true |
@@ -39,6 +38,7 @@ Comet provides the following configuration settings.
| spark.comet.exec.memoryFraction | The fraction of memory from Comet memory
overhead that the native memory manager can use for execution. The purpose of
this config is to set aside memory for untracked data structures, as well as
imprecise size estimation during memory acquisition. Default value is 0.7. |
0.7 |
| spark.comet.exec.shuffle.codec | The codec of Comet native shuffle used to
compress shuffle data. Only zstd is supported. | zstd |
| spark.comet.exec.shuffle.enabled | Whether to enable Comet native shuffle.
By default, this config is false. Note that this requires setting
'spark.shuffle.manager' to
'org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager'.
'spark.shuffle.manager' must be set before starting the Spark application and
cannot be changed during the application. | false |
+| spark.comet.exec.shuffle.mode | The mode of Comet shuffle. This config is
only effective if Comet shuffle is enabled. Available modes are 'native',
'jvm', and 'auto'. 'native' is for native shuffle which has best performance in
general. 'jvm' is for jvm-based columnar shuffle which has higher coverage than
native shuffle. 'auto' is for Comet to choose the best shuffle mode based on
the query plan. By default, this config is 'jvm'. | jvm |
| spark.comet.explainFallback.enabled | When this setting is enabled, Comet
will provide logging explaining the reason(s) why a query stage cannot be
executed natively. | false |
| spark.comet.memory.overhead.factor | Fraction of executor memory to be
allocated as additional non-heap memory per executor process for Comet. Default
value is 0.2. | 0.2 |
| spark.comet.memory.overhead.min | Minimum amount of additional memory to be
allocated per executor process for Comet, in MiB. | 402653184b |
diff --git a/_sources/user-guide/tuning.md.txt
b/_sources/user-guide/tuning.md.txt
index 01fa7bdb..5a3100bd 100644
--- a/_sources/user-guide/tuning.md.txt
+++ b/_sources/user-guide/tuning.md.txt
@@ -39,22 +39,26 @@ It must be set before the Spark context is created. You can
enable or disable Co
at runtime by setting `spark.comet.exec.shuffle.enabled` to `true` or `false`.
Once it is disabled, Comet will fallback to the default Spark shuffle manager.
-### Columnar Shuffle
+### Shuffle Mode
-By default, once `spark.comet.exec.shuffle.enabled` is enabled, Comet uses
columnar shuffle
+Comet provides three shuffle modes: Columnar Shuffle, Native Shuffle and Auto
Mode.
+
+#### Columnar Shuffle
+
+By default, once `spark.comet.exec.shuffle.enabled` is enabled, Comet uses
JVM-based columnar shuffle
to improve the performance of shuffle operations. Columnar shuffle supports
HashPartitioning,
-RoundRobinPartitioning, RangePartitioning and SinglePartitioning.
+RoundRobinPartitioning, RangePartitioning and SinglePartitioning. This mode
has the highest
+query coverage.
-Columnar shuffle can be disabled by setting
`spark.comet.columnar.shuffle.enabled` to `false`.
+Columnar shuffle can be enabled by setting `spark.comet.exec.shuffle.mode` to
`jvm`.
-### Native Shuffle
+#### Native Shuffle
Comet also provides a fully native shuffle implementation that can be used to
improve the performance.
-To enable native shuffle, just disable `spark.comet.columnar.shuffle.enabled`.
+To enable native shuffle, just set `spark.comet.exec.shuffle.mode` to `native`
Native shuffle only supports HashPartitioning and SinglePartitioning.
+### Auto Mode
-
-
-
+`spark.comet.exec.shuffle.mode` to `auto` will let Comet choose the best
shuffle mode based on the query plan.
diff --git a/searchindex.js b/searchindex.js
index bbcc1a9c..3b9b5384 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"ANSI mode": [[8, "ansi-mode"]], "API
Differences Between Spark Versions": [[0,
"api-differences-between-spark-versions"]], "ASF Links": [[7, null]], "Adding
Spark-side Tests for the New Expression": [[0,
"adding-spark-side-tests-for-the-new-expression"]], "Adding a New Expression":
[[0, "adding-a-new-expression"]], "Adding a New Scalar Function Expression":
[[0, "adding-a-new-scalar-function-expression"]], "Adding the Expression To the
Protobuf Definition" [...]
\ No newline at end of file
+Search.setIndex({"alltitles": {"ANSI mode": [[8, "ansi-mode"]], "API
Differences Between Spark Versions": [[0,
"api-differences-between-spark-versions"]], "ASF Links": [[7, null]], "Adding
Spark-side Tests for the New Expression": [[0,
"adding-spark-side-tests-for-the-new-expression"]], "Adding a New Expression":
[[0, "adding-a-new-expression"]], "Adding a New Scalar Function Expression":
[[0, "adding-a-new-scalar-function-expression"]], "Adding the Expression To the
Protobuf Definition" [...]
\ No newline at end of file
diff --git a/user-guide/configs.html b/user-guide/configs.html
index 3af81583..f248982d 100644
--- a/user-guide/configs.html
+++ b/user-guide/configs.html
@@ -337,46 +337,46 @@ under the License.
<td><p>Number of threads used for Comet async columnar shuffle per shuffle
task. By default, this config is 3. Note that more threads means more memory
requirement to buffer shuffle data before flushing to disk. Also, more threads
may not always improve performance, and should be set based on the number of
cores available.</p></td>
<td><p>3</p></td>
</tr>
-<tr class="row-even"><td><p>spark.comet.columnar.shuffle.enabled</p></td>
-<td><p>Whether to enable Arrow-based columnar shuffle for Comet and Spark
regular operators. If this is enabled, Comet prefers columnar shuffle than
native shuffle. By default, this config is true.</p></td>
-<td><p>true</p></td>
-</tr>
-<tr class="row-odd"><td><p>spark.comet.columnar.shuffle.memory.factor</p></td>
+<tr class="row-even"><td><p>spark.comet.columnar.shuffle.memory.factor</p></td>
<td><p>Fraction of Comet memory to be allocated per executor process for Comet
shuffle. Comet memory size is specified by <code class="docutils literal
notranslate"><span class="pre">spark.comet.memoryOverhead</span></code> or
calculated by <code class="docutils literal notranslate"><span
class="pre">spark.comet.memory.overhead.factor</span></code> * <code
class="docutils literal notranslate"><span
class="pre">spark.executor.memory</span></code>. By default, this config is
1.0.</p></td>
<td><p>1.0</p></td>
</tr>
-<tr class="row-even"><td><p>spark.comet.debug.enabled</p></td>
+<tr class="row-odd"><td><p>spark.comet.debug.enabled</p></td>
<td><p>Whether to enable debug mode for Comet. By default, this config is
false. When enabled, Comet will do additional checks for debugging purpose. For
example, validating array when importing arrays from JVM at native side. Note
that these checks may be expensive in performance and should only be enabled
for debugging purpose.</p></td>
<td><p>false</p></td>
</tr>
-<tr class="row-odd"><td><p>spark.comet.enabled</p></td>
+<tr class="row-even"><td><p>spark.comet.enabled</p></td>
<td><p>Whether to enable Comet extension for Spark. When this is turned on,
Spark will use Comet to read Parquet data source. Note that to enable native
vectorized execution, both this config and ‘spark.comet.exec.enabled’ need to
be enabled. By default, this config is the value of the env var <code
class="docutils literal notranslate"><span
class="pre">ENABLE_COMET</span></code> if set, or true otherwise.</p></td>
<td><p>true</p></td>
</tr>
-<tr class="row-even"><td><p>spark.comet.exceptionOnDatetimeRebase</p></td>
+<tr class="row-odd"><td><p>spark.comet.exceptionOnDatetimeRebase</p></td>
<td><p>Whether to throw exception when seeing dates/timestamps from the legacy
hybrid (Julian + Gregorian) calendar. Since Spark 3, dates/timestamps were
written according to the Proleptic Gregorian calendar. When this is true, Comet
will throw exceptions when seeing these dates/timestamps that were written by
Spark version before 3.0. If this is false, these dates/timestamps will be read
as if they were written to the Proleptic Gregorian calendar and will not be
rebased.</p></td>
<td><p>false</p></td>
</tr>
-<tr class="row-odd"><td><p>spark.comet.exec.all.enabled</p></td>
+<tr class="row-even"><td><p>spark.comet.exec.all.enabled</p></td>
<td><p>Whether to enable all Comet operators. By default, this config is
false. Note that this config precedes all separate config
‘spark.comet.exec.<operator_name>.enabled’. That being said, if this
config is enabled, separate configs are ignored.</p></td>
<td><p>false</p></td>
</tr>
-<tr class="row-even"><td><p>spark.comet.exec.enabled</p></td>
+<tr class="row-odd"><td><p>spark.comet.exec.enabled</p></td>
<td><p>Whether to enable Comet native vectorized execution for Spark. This
controls whether Spark should convert operators into their Comet counterparts
and execute them in native space. Note: each operator is associated with a
separate config in the format of
‘spark.comet.exec.<operator_name>.enabled’ at the moment, and both the
config and this need to be turned on, in order for the operator to be executed
in native. By default, this config is false.</p></td>
<td><p>false</p></td>
</tr>
-<tr class="row-odd"><td><p>spark.comet.exec.memoryFraction</p></td>
+<tr class="row-even"><td><p>spark.comet.exec.memoryFraction</p></td>
<td><p>The fraction of memory from Comet memory overhead that the native
memory manager can use for execution. The purpose of this config is to set
aside memory for untracked data structures, as well as imprecise size
estimation during memory acquisition. Default value is 0.7.</p></td>
<td><p>0.7</p></td>
</tr>
-<tr class="row-even"><td><p>spark.comet.exec.shuffle.codec</p></td>
+<tr class="row-odd"><td><p>spark.comet.exec.shuffle.codec</p></td>
<td><p>The codec of Comet native shuffle used to compress shuffle data. Only
zstd is supported.</p></td>
<td><p>zstd</p></td>
</tr>
-<tr class="row-odd"><td><p>spark.comet.exec.shuffle.enabled</p></td>
+<tr class="row-even"><td><p>spark.comet.exec.shuffle.enabled</p></td>
<td><p>Whether to enable Comet native shuffle. By default, this config is
false. Note that this requires setting ‘spark.shuffle.manager’ to
‘org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager’.
‘spark.shuffle.manager’ must be set before starting the Spark application and
cannot be changed during the application.</p></td>
<td><p>false</p></td>
</tr>
+<tr class="row-odd"><td><p>spark.comet.exec.shuffle.mode</p></td>
+<td><p>The mode of Comet shuffle. This config is only effective if Comet
shuffle is enabled. Available modes are ‘native’, ‘jvm’, and ‘auto’. ‘native’
is for native shuffle which has best performance in general. ‘jvm’ is for
jvm-based columnar shuffle which has higher coverage than native shuffle.
‘auto’ is for Comet to choose the best shuffle mode based on the query plan. By
default, this config is ‘jvm’.</p></td>
+<td><p>jvm</p></td>
+</tr>
<tr class="row-even"><td><p>spark.comet.explainFallback.enabled</p></td>
<td><p>When this setting is enabled, Comet will provide logging explaining the
reason(s) why a query stage cannot be executed natively.</p></td>
<td><p>false</p></td>
diff --git a/user-guide/tuning.html b/user-guide/tuning.html
index 0f5c7a87..ec97f952 100644
--- a/user-guide/tuning.html
+++ b/user-guide/tuning.html
@@ -267,13 +267,25 @@ under the License.
</a>
<ul class="nav section-nav flex-column">
<li class="toc-h3 nav-item toc-entry">
- <a class="reference internal nav-link" href="#columnar-shuffle">
- Columnar Shuffle
+ <a class="reference internal nav-link" href="#shuffle-mode">
+ Shuffle Mode
</a>
+ <ul class="nav section-nav flex-column">
+ <li class="toc-h4 nav-item toc-entry">
+ <a class="reference internal nav-link" href="#columnar-shuffle">
+ Columnar Shuffle
+ </a>
+ </li>
+ <li class="toc-h4 nav-item toc-entry">
+ <a class="reference internal nav-link" href="#native-shuffle">
+ Native Shuffle
+ </a>
+ </li>
+ </ul>
</li>
<li class="toc-h3 nav-item toc-entry">
- <a class="reference internal nav-link" href="#native-shuffle">
- Native Shuffle
+ <a class="reference internal nav-link" href="#auto-mode">
+ Auto Mode
</a>
</li>
</ul>
@@ -340,20 +352,29 @@ The following sections describe the different shuffle
options available in Comet
It must be set before the Spark context is created. You can enable or disable
Comet shuffle
at runtime by setting <code class="docutils literal notranslate"><span
class="pre">spark.comet.exec.shuffle.enabled</span></code> to <code
class="docutils literal notranslate"><span class="pre">true</span></code> or
<code class="docutils literal notranslate"><span
class="pre">false</span></code>.
Once it is disabled, Comet will fallback to the default Spark shuffle
manager.</p>
+<section id="shuffle-mode">
+<h3>Shuffle Mode<a class="headerlink" href="#shuffle-mode" title="Link to this
heading">¶</a></h3>
+<p>Comet provides three shuffle modes: Columnar Shuffle, Native Shuffle and
Auto Mode.</p>
<section id="columnar-shuffle">
-<h3>Columnar Shuffle<a class="headerlink" href="#columnar-shuffle" title="Link
to this heading">¶</a></h3>
-<p>By default, once <code class="docutils literal notranslate"><span
class="pre">spark.comet.exec.shuffle.enabled</span></code> is enabled, Comet
uses columnar shuffle
+<h4>Columnar Shuffle<a class="headerlink" href="#columnar-shuffle" title="Link
to this heading">¶</a></h4>
+<p>By default, once <code class="docutils literal notranslate"><span
class="pre">spark.comet.exec.shuffle.enabled</span></code> is enabled, Comet
uses JVM-based columnar shuffle
to improve the performance of shuffle operations. Columnar shuffle supports
HashPartitioning,
-RoundRobinPartitioning, RangePartitioning and SinglePartitioning.</p>
-<p>Columnar shuffle can be disabled by setting <code class="docutils literal
notranslate"><span
class="pre">spark.comet.columnar.shuffle.enabled</span></code> to <code
class="docutils literal notranslate"><span class="pre">false</span></code>.</p>
+RoundRobinPartitioning, RangePartitioning and SinglePartitioning. This mode
has the highest
+query coverage.</p>
+<p>Columnar shuffle can be enabled by setting <code class="docutils literal
notranslate"><span class="pre">spark.comet.exec.shuffle.mode</span></code> to
<code class="docutils literal notranslate"><span
class="pre">jvm</span></code>.</p>
</section>
<section id="native-shuffle">
-<h3>Native Shuffle<a class="headerlink" href="#native-shuffle" title="Link to
this heading">¶</a></h3>
+<h4>Native Shuffle<a class="headerlink" href="#native-shuffle" title="Link to
this heading">¶</a></h4>
<p>Comet also provides a fully native shuffle implementation that can be used
to improve the performance.
-To enable native shuffle, just disable <code class="docutils literal
notranslate"><span
class="pre">spark.comet.columnar.shuffle.enabled</span></code>.</p>
+To enable native shuffle, just set <code class="docutils literal
notranslate"><span class="pre">spark.comet.exec.shuffle.mode</span></code> to
<code class="docutils literal notranslate"><span
class="pre">native</span></code></p>
<p>Native shuffle only supports HashPartitioning and SinglePartitioning.</p>
</section>
</section>
+<section id="auto-mode">
+<h3>Auto Mode<a class="headerlink" href="#auto-mode" title="Link to this
heading">¶</a></h3>
+<p><code class="docutils literal notranslate"><span
class="pre">spark.comet.exec.shuffle.mode</span></code> to <code
class="docutils literal notranslate"><span class="pre">auto</span></code> will
let Comet choose the best shuffle mode based on the query plan.</p>
+</section>
+</section>
</section>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]