(datafusion-comet) branch asf-site updated: Publish built docs triggered by 507e475bd06fbb9cc738d61473a8341caa6daf74

github-bot Thu, 23 May 2024 14:08:18 -0700

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 36dce360 Publish built docs triggered by 
507e475bd06fbb9cc738d61473a8341caa6daf74
36dce360 is described below

commit 36dce3602959d63efef9625535b6b84c2b1c5067
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Thu May 23 21:08:11 2024 +0000

    Publish built docs triggered by 507e475bd06fbb9cc738d61473a8341caa6daf74
---
 _sources/user-guide/configs.md.txt |  2 +-
 _sources/user-guide/tuning.md.txt  | 22 +++++++++++---------
 searchindex.js                     |  2 +-
 user-guide/configs.html            | 26 ++++++++++++------------
 user-guide/tuning.html             | 41 ++++++++++++++++++++++++++++----------
 5 files changed, 59 insertions(+), 34 deletions(-)

diff --git a/_sources/user-guide/configs.md.txt 
b/_sources/user-guide/configs.md.txt
index 0204b0c5..eb349b34 100644
--- a/_sources/user-guide/configs.md.txt
+++ b/_sources/user-guide/configs.md.txt
@@ -29,7 +29,6 @@ Comet provides the following configuration settings.
 | spark.comet.columnar.shuffle.async.enabled | Whether to enable asynchronous 
shuffle for Arrow-based shuffle. By default, this config is false. | false |
 | spark.comet.columnar.shuffle.async.max.thread.num | Maximum number of 
threads on an executor used for Comet async columnar shuffle. By default, this 
config is 100. This is the upper bound of total number of shuffle threads per 
executor. In other words, if the number of cores * the number of shuffle 
threads per task `spark.comet.columnar.shuffle.async.thread.num` is larger than 
this config. Comet will use this config as the number of shuffle threads per 
executor instead. | 100 |
 | spark.comet.columnar.shuffle.async.thread.num | Number of threads used for 
Comet async columnar shuffle per shuffle task. By default, this config is 3. 
Note that more threads means more memory requirement to buffer shuffle data 
before flushing to disk. Also, more threads may not always improve performance, 
and should be set based on the number of cores available. | 3 |
-| spark.comet.columnar.shuffle.enabled | Whether to enable Arrow-based 
columnar shuffle for Comet and Spark regular operators. If this is enabled, 
Comet prefers columnar shuffle than native shuffle. By default, this config is 
true. | true |
 | spark.comet.columnar.shuffle.memory.factor | Fraction of Comet memory to be 
allocated per executor process for Comet shuffle. Comet memory size is 
specified by `spark.comet.memoryOverhead` or calculated by 
`spark.comet.memory.overhead.factor` * `spark.executor.memory`. By default, 
this config is 1.0. | 1.0 |
 | spark.comet.debug.enabled | Whether to enable debug mode for Comet. By 
default, this config is false. When enabled, Comet will do additional checks 
for debugging purpose. For example, validating array when importing arrays from 
JVM at native side. Note that these checks may be expensive in performance and 
should only be enabled for debugging purpose. | false |
 | spark.comet.enabled | Whether to enable Comet extension for Spark. When this 
is turned on, Spark will use Comet to read Parquet data source. Note that to 
enable native vectorized execution, both this config and 
'spark.comet.exec.enabled' need to be enabled. By default, this config is the 
value of the env var `ENABLE_COMET` if set, or true otherwise. | true |
@@ -39,6 +38,7 @@ Comet provides the following configuration settings.
 | spark.comet.exec.memoryFraction | The fraction of memory from Comet memory 
overhead that the native memory manager can use for execution. The purpose of 
this config is to set aside memory for untracked data structures, as well as 
imprecise size estimation during memory acquisition. Default value is 0.7. | 
0.7 |
 | spark.comet.exec.shuffle.codec | The codec of Comet native shuffle used to 
compress shuffle data. Only zstd is supported. | zstd |
 | spark.comet.exec.shuffle.enabled | Whether to enable Comet native shuffle. 
By default, this config is false. Note that this requires setting 
'spark.shuffle.manager' to 
'org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager'. 
'spark.shuffle.manager' must be set before starting the Spark application and 
cannot be changed during the application. | false |
+| spark.comet.exec.shuffle.mode | The mode of Comet shuffle. This config is 
only effective if Comet shuffle is enabled. Available modes are 'native', 
'jvm', and 'auto'. 'native' is for native shuffle which has best performance in 
general. 'jvm' is for jvm-based columnar shuffle which has higher coverage than 
native shuffle. 'auto' is for Comet to choose the best shuffle mode based on 
the query plan. By default, this config is 'jvm'. | jvm |
 | spark.comet.explainFallback.enabled | When this setting is enabled, Comet 
will provide logging explaining the reason(s) why a query stage cannot be 
executed natively. | false |
 | spark.comet.memory.overhead.factor | Fraction of executor memory to be 
allocated as additional non-heap memory per executor process for Comet. Default 
value is 0.2. | 0.2 |
 | spark.comet.memory.overhead.min | Minimum amount of additional memory to be 
allocated per executor process for Comet, in MiB. | 402653184b |
diff --git a/_sources/user-guide/tuning.md.txt 
b/_sources/user-guide/tuning.md.txt
index 01fa7bdb..5a3100bd 100644
--- a/_sources/user-guide/tuning.md.txt
+++ b/_sources/user-guide/tuning.md.txt
@@ -39,22 +39,26 @@ It must be set before the Spark context is created. You can 
enable or disable Co
 at runtime by setting `spark.comet.exec.shuffle.enabled` to `true` or `false`.
 Once it is disabled, Comet will fallback to the default Spark shuffle manager.
 
-### Columnar Shuffle
+### Shuffle Mode
 
-By default, once `spark.comet.exec.shuffle.enabled` is enabled, Comet uses 
columnar shuffle
+Comet provides three shuffle modes: Columnar Shuffle, Native Shuffle and Auto 
Mode.
+
+#### Columnar Shuffle
+
+By default, once `spark.comet.exec.shuffle.enabled` is enabled, Comet uses 
JVM-based columnar shuffle
 to improve the performance of shuffle operations. Columnar shuffle supports 
HashPartitioning,
-RoundRobinPartitioning, RangePartitioning and SinglePartitioning.
+RoundRobinPartitioning, RangePartitioning and SinglePartitioning. This mode 
has the highest
+query coverage.
 
-Columnar shuffle can be disabled by setting 
`spark.comet.columnar.shuffle.enabled` to `false`.
+Columnar shuffle can be enabled by setting `spark.comet.exec.shuffle.mode` to 
`jvm`.
 
-### Native Shuffle
+#### Native Shuffle
 
 Comet also provides a fully native shuffle implementation that can be used to 
improve the performance.
-To enable native shuffle, just disable `spark.comet.columnar.shuffle.enabled`.
+To enable native shuffle, just set `spark.comet.exec.shuffle.mode` to `native`
 
 Native shuffle only supports HashPartitioning and SinglePartitioning.
 
+### Auto Mode
 
-
-
-
+`spark.comet.exec.shuffle.mode` to `auto` will let Comet choose the best 
shuffle mode based on the query plan.
diff --git a/searchindex.js b/searchindex.js
index bbcc1a9c..3b9b5384 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"ANSI mode": [[8, "ansi-mode"]], "API 
Differences Between Spark Versions": [[0, 
"api-differences-between-spark-versions"]], "ASF Links": [[7, null]], "Adding 
Spark-side Tests for the New Expression": [[0, 
"adding-spark-side-tests-for-the-new-expression"]], "Adding a New Expression": 
[[0, "adding-a-new-expression"]], "Adding a New Scalar Function Expression": 
[[0, "adding-a-new-scalar-function-expression"]], "Adding the Expression To the 
Protobuf Definition" [...]
\ No newline at end of file
+Search.setIndex({"alltitles": {"ANSI mode": [[8, "ansi-mode"]], "API 
Differences Between Spark Versions": [[0, 
"api-differences-between-spark-versions"]], "ASF Links": [[7, null]], "Adding 
Spark-side Tests for the New Expression": [[0, 
"adding-spark-side-tests-for-the-new-expression"]], "Adding a New Expression": 
[[0, "adding-a-new-expression"]], "Adding a New Scalar Function Expression": 
[[0, "adding-a-new-scalar-function-expression"]], "Adding the Expression To the 
Protobuf Definition" [...]
\ No newline at end of file
diff --git a/user-guide/configs.html b/user-guide/configs.html
index 3af81583..f248982d 100644
--- a/user-guide/configs.html
+++ b/user-guide/configs.html
@@ -337,46 +337,46 @@ under the License.
 <td><p>Number of threads used for Comet async columnar shuffle per shuffle 
task. By default, this config is 3. Note that more threads means more memory 
requirement to buffer shuffle data before flushing to disk. Also, more threads 
may not always improve performance, and should be set based on the number of 
cores available.</p></td>
 <td><p>3</p></td>
 </tr>
-<tr class="row-even"><td><p>spark.comet.columnar.shuffle.enabled</p></td>
-<td><p>Whether to enable Arrow-based columnar shuffle for Comet and Spark 
regular operators. If this is enabled, Comet prefers columnar shuffle than 
native shuffle. By default, this config is true.</p></td>
-<td><p>true</p></td>
-</tr>
-<tr class="row-odd"><td><p>spark.comet.columnar.shuffle.memory.factor</p></td>
+<tr class="row-even"><td><p>spark.comet.columnar.shuffle.memory.factor</p></td>
 <td><p>Fraction of Comet memory to be allocated per executor process for Comet 
shuffle. Comet memory size is specified by <code class="docutils literal 
notranslate"><span class="pre">spark.comet.memoryOverhead</span></code> or 
calculated by <code class="docutils literal notranslate"><span 
class="pre">spark.comet.memory.overhead.factor</span></code> * <code 
class="docutils literal notranslate"><span 
class="pre">spark.executor.memory</span></code>. By default, this config is 
1.0.</p></td>
 <td><p>1.0</p></td>
 </tr>
-<tr class="row-even"><td><p>spark.comet.debug.enabled</p></td>
+<tr class="row-odd"><td><p>spark.comet.debug.enabled</p></td>
 <td><p>Whether to enable debug mode for Comet. By default, this config is 
false. When enabled, Comet will do additional checks for debugging purpose. For 
example, validating array when importing arrays from JVM at native side. Note 
that these checks may be expensive in performance and should only be enabled 
for debugging purpose.</p></td>
 <td><p>false</p></td>
 </tr>
-<tr class="row-odd"><td><p>spark.comet.enabled</p></td>
+<tr class="row-even"><td><p>spark.comet.enabled</p></td>
 <td><p>Whether to enable Comet extension for Spark. When this is turned on, 
Spark will use Comet to read Parquet data source. Note that to enable native 
vectorized execution, both this config and ‘spark.comet.exec.enabled’ need to 
be enabled. By default, this config is the value of the env var <code 
class="docutils literal notranslate"><span 
class="pre">ENABLE_COMET</span></code> if set, or true otherwise.</p></td>
 <td><p>true</p></td>
 </tr>
-<tr class="row-even"><td><p>spark.comet.exceptionOnDatetimeRebase</p></td>
+<tr class="row-odd"><td><p>spark.comet.exceptionOnDatetimeRebase</p></td>
 <td><p>Whether to throw exception when seeing dates/timestamps from the legacy 
hybrid (Julian + Gregorian) calendar. Since Spark 3, dates/timestamps were 
written according to the Proleptic Gregorian calendar. When this is true, Comet 
will throw exceptions when seeing these dates/timestamps that were written by 
Spark version before 3.0. If this is false, these dates/timestamps will be read 
as if they were written to the Proleptic Gregorian calendar and will not be 
rebased.</p></td>
 <td><p>false</p></td>
 </tr>
-<tr class="row-odd"><td><p>spark.comet.exec.all.enabled</p></td>
+<tr class="row-even"><td><p>spark.comet.exec.all.enabled</p></td>
 <td><p>Whether to enable all Comet operators. By default, this config is 
false. Note that this config precedes all separate config 
‘spark.comet.exec.&lt;operator_name&gt;.enabled’. That being said, if this 
config is enabled, separate configs are ignored.</p></td>
 <td><p>false</p></td>
 </tr>
-<tr class="row-even"><td><p>spark.comet.exec.enabled</p></td>
+<tr class="row-odd"><td><p>spark.comet.exec.enabled</p></td>
 <td><p>Whether to enable Comet native vectorized execution for Spark. This 
controls whether Spark should convert operators into their Comet counterparts 
and execute them in native space. Note: each operator is associated with a 
separate config in the format of 
‘spark.comet.exec.&lt;operator_name&gt;.enabled’ at the moment, and both the 
config and this need to be turned on, in order for the operator to be executed 
in native. By default, this config is false.</p></td>
 <td><p>false</p></td>
 </tr>
-<tr class="row-odd"><td><p>spark.comet.exec.memoryFraction</p></td>
+<tr class="row-even"><td><p>spark.comet.exec.memoryFraction</p></td>
 <td><p>The fraction of memory from Comet memory overhead that the native 
memory manager can use for execution. The purpose of this config is to set 
aside memory for untracked data structures, as well as imprecise size 
estimation during memory acquisition. Default value is 0.7.</p></td>
 <td><p>0.7</p></td>
 </tr>
-<tr class="row-even"><td><p>spark.comet.exec.shuffle.codec</p></td>
+<tr class="row-odd"><td><p>spark.comet.exec.shuffle.codec</p></td>
 <td><p>The codec of Comet native shuffle used to compress shuffle data. Only 
zstd is supported.</p></td>
 <td><p>zstd</p></td>
 </tr>
-<tr class="row-odd"><td><p>spark.comet.exec.shuffle.enabled</p></td>
+<tr class="row-even"><td><p>spark.comet.exec.shuffle.enabled</p></td>
 <td><p>Whether to enable Comet native shuffle. By default, this config is 
false. Note that this requires setting ‘spark.shuffle.manager’ to 
‘org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager’. 
‘spark.shuffle.manager’ must be set before starting the Spark application and 
cannot be changed during the application.</p></td>
 <td><p>false</p></td>
 </tr>
+<tr class="row-odd"><td><p>spark.comet.exec.shuffle.mode</p></td>
+<td><p>The mode of Comet shuffle. This config is only effective if Comet 
shuffle is enabled. Available modes are ‘native’, ‘jvm’, and ‘auto’. ‘native’ 
is for native shuffle which has best performance in general. ‘jvm’ is for 
jvm-based columnar shuffle which has higher coverage than native shuffle. 
‘auto’ is for Comet to choose the best shuffle mode based on the query plan. By 
default, this config is ‘jvm’.</p></td>
+<td><p>jvm</p></td>
+</tr>
 <tr class="row-even"><td><p>spark.comet.explainFallback.enabled</p></td>
 <td><p>When this setting is enabled, Comet will provide logging explaining the 
reason(s) why a query stage cannot be executed natively.</p></td>
 <td><p>false</p></td>
diff --git a/user-guide/tuning.html b/user-guide/tuning.html
index 0f5c7a87..ec97f952 100644
--- a/user-guide/tuning.html
+++ b/user-guide/tuning.html
@@ -267,13 +267,25 @@ under the License.
   </a>
   <ul class="nav section-nav flex-column">
    <li class="toc-h3 nav-item toc-entry">
-    <a class="reference internal nav-link" href="#columnar-shuffle">
-     Columnar Shuffle
+    <a class="reference internal nav-link" href="#shuffle-mode">
+     Shuffle Mode
     </a>
+    <ul class="nav section-nav flex-column">
+     <li class="toc-h4 nav-item toc-entry">
+      <a class="reference internal nav-link" href="#columnar-shuffle">
+       Columnar Shuffle
+      </a>
+     </li>
+     <li class="toc-h4 nav-item toc-entry">
+      <a class="reference internal nav-link" href="#native-shuffle">
+       Native Shuffle
+      </a>
+     </li>
+    </ul>
    </li>
    <li class="toc-h3 nav-item toc-entry">
-    <a class="reference internal nav-link" href="#native-shuffle">
-     Native Shuffle
+    <a class="reference internal nav-link" href="#auto-mode">
+     Auto Mode
     </a>
    </li>
   </ul>
@@ -340,20 +352,29 @@ The following sections describe the different shuffle 
options available in Comet
 It must be set before the Spark context is created. You can enable or disable 
Comet shuffle
 at runtime by setting <code class="docutils literal notranslate"><span 
class="pre">spark.comet.exec.shuffle.enabled</span></code> to <code 
class="docutils literal notranslate"><span class="pre">true</span></code> or 
<code class="docutils literal notranslate"><span 
class="pre">false</span></code>.
 Once it is disabled, Comet will fallback to the default Spark shuffle 
manager.</p>
+<section id="shuffle-mode">
+<h3>Shuffle Mode<a class="headerlink" href="#shuffle-mode" title="Link to this 
heading">¶</a></h3>
+<p>Comet provides three shuffle modes: Columnar Shuffle, Native Shuffle and 
Auto Mode.</p>
 <section id="columnar-shuffle">
-<h3>Columnar Shuffle<a class="headerlink" href="#columnar-shuffle" title="Link 
to this heading">¶</a></h3>
-<p>By default, once <code class="docutils literal notranslate"><span 
class="pre">spark.comet.exec.shuffle.enabled</span></code> is enabled, Comet 
uses columnar shuffle
+<h4>Columnar Shuffle<a class="headerlink" href="#columnar-shuffle" title="Link 
to this heading">¶</a></h4>
+<p>By default, once <code class="docutils literal notranslate"><span 
class="pre">spark.comet.exec.shuffle.enabled</span></code> is enabled, Comet 
uses JVM-based columnar shuffle
 to improve the performance of shuffle operations. Columnar shuffle supports 
HashPartitioning,
-RoundRobinPartitioning, RangePartitioning and SinglePartitioning.</p>
-<p>Columnar shuffle can be disabled by setting <code class="docutils literal 
notranslate"><span 
class="pre">spark.comet.columnar.shuffle.enabled</span></code> to <code 
class="docutils literal notranslate"><span class="pre">false</span></code>.</p>
+RoundRobinPartitioning, RangePartitioning and SinglePartitioning. This mode 
has the highest
+query coverage.</p>
+<p>Columnar shuffle can be enabled by setting <code class="docutils literal 
notranslate"><span class="pre">spark.comet.exec.shuffle.mode</span></code> to 
<code class="docutils literal notranslate"><span 
class="pre">jvm</span></code>.</p>
 </section>
 <section id="native-shuffle">
-<h3>Native Shuffle<a class="headerlink" href="#native-shuffle" title="Link to 
this heading">¶</a></h3>
+<h4>Native Shuffle<a class="headerlink" href="#native-shuffle" title="Link to 
this heading">¶</a></h4>
 <p>Comet also provides a fully native shuffle implementation that can be used 
to improve the performance.
-To enable native shuffle, just disable <code class="docutils literal 
notranslate"><span 
class="pre">spark.comet.columnar.shuffle.enabled</span></code>.</p>
+To enable native shuffle, just set <code class="docutils literal 
notranslate"><span class="pre">spark.comet.exec.shuffle.mode</span></code> to 
<code class="docutils literal notranslate"><span 
class="pre">native</span></code></p>
 <p>Native shuffle only supports HashPartitioning and SinglePartitioning.</p>
 </section>
 </section>
+<section id="auto-mode">
+<h3>Auto Mode<a class="headerlink" href="#auto-mode" title="Link to this 
heading">¶</a></h3>
+<p><code class="docutils literal notranslate"><span 
class="pre">spark.comet.exec.shuffle.mode</span></code> to <code 
class="docutils literal notranslate"><span class="pre">auto</span></code> will 
let Comet choose the best shuffle mode based on the query plan.</p>
+</section>
+</section>
 </section>
 
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(datafusion-comet) branch asf-site updated: Publish built docs triggered by 507e475bd06fbb9cc738d61473a8341caa6daf74

Reply via email to