This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 682c9a222 Publish built docs triggered by
0b33b051ddab43d188e3637b635fed18330bccc5
682c9a222 is described below
commit 682c9a2228ae44b30e845af97c4f3e2563f58c39
Author: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
AuthorDate: Tue Oct 28 17:22:37 2025 +0000
Publish built docs triggered by 0b33b051ddab43d188e3637b635fed18330bccc5
---
_sources/user-guide/latest/compatibility.md.txt | 4 ++++
_sources/user-guide/latest/tuning.md.txt | 6 ++++++
searchindex.js | 2 +-
user-guide/latest/compatibility.html | 3 +++
user-guide/latest/tuning.html | 6 ++++++
5 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/_sources/user-guide/latest/compatibility.md.txt
b/_sources/user-guide/latest/compatibility.md.txt
index 562baabfd..6c3bab59d 100644
--- a/_sources/user-guide/latest/compatibility.md.txt
+++ b/_sources/user-guide/latest/compatibility.md.txt
@@ -97,6 +97,10 @@ because they are handled well in Spark (e.g.,
`SQLOrderingUtil.compareFloats`).
functions of arrow-rs used by DataFusion do not normalize NaN and zero (e.g.,
[arrow::compute::kernels::cmp::eq](https://docs.rs/arrow/latest/arrow/compute/kernels/cmp/fn.eq.html#)).
So Comet will add additional normalization expression of NaN and zero for
comparison.
+Sorting on floating-point data types (or complex types containing
floating-point values) is not compatible with
+Spark if the data contains both zero and negative zero. This is likely an edge
case that is not of concern for many users
+and sorting on floating-point data can be enabled by setting
`spark.comet.expression.SortOrder.allowIncompatible=true`.
+
There is a known bug with using count(distinct) within aggregate queries,
where each NaN value will be counted
separately [#1824](https://github.com/apache/datafusion-comet/issues/1824).
diff --git a/_sources/user-guide/latest/tuning.md.txt
b/_sources/user-guide/latest/tuning.md.txt
index cc0109526..21b1df652 100644
--- a/_sources/user-guide/latest/tuning.md.txt
+++ b/_sources/user-guide/latest/tuning.md.txt
@@ -100,6 +100,12 @@ Comet Performance
It may be possible to reduce Comet's memory overhead by reducing batch sizes
or increasing number of partitions.
+## Optimizing Sorting on Floating-Point Values
+
+Sorting on floating-point data types (or complex types containing
floating-point values) is not compatible with
+Spark if the data contains both zero and negative zero. This is likely an edge
case that is not of concern for many users
+and sorting on floating-point data can be enabled by setting
`spark.comet.expression.SortOrder.allowIncompatible=true`.
+
## Optimizing Joins
Spark often chooses `SortMergeJoin` over `ShuffledHashJoin` for stability
reasons. If the build-side of a
diff --git a/searchindex.js b/searchindex.js
index e3cf0355d..cd5455146 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1. Install Comet": [[16, "install-comet"]],
"2. Clone Spark and Apply Diff": [[16, "clone-spark-and-apply-diff"]], "3. Run
Spark SQL Tests": [[16, "run-spark-sql-tests"]], "ANSI Mode": [[19,
"ansi-mode"], [32, "ansi-mode"], [72, "ansi-mode"]], "ANSI mode": [[45,
"ansi-mode"], [58, "ansi-mode"]], "API Differences Between Spark Versions":
[[3, "api-differences-between-spark-versions"]], "ASF Links": [[2, null], [2,
null]], "Accelerating Apache Iceberg Parque [...]
\ No newline at end of file
+Search.setIndex({"alltitles": {"1. Install Comet": [[16, "install-comet"]],
"2. Clone Spark and Apply Diff": [[16, "clone-spark-and-apply-diff"]], "3. Run
Spark SQL Tests": [[16, "run-spark-sql-tests"]], "ANSI Mode": [[19,
"ansi-mode"], [32, "ansi-mode"], [72, "ansi-mode"]], "ANSI mode": [[45,
"ansi-mode"], [58, "ansi-mode"]], "API Differences Between Spark Versions":
[[3, "api-differences-between-spark-versions"]], "ASF Links": [[2, null], [2,
null]], "Accelerating Apache Iceberg Parque [...]
\ No newline at end of file
diff --git a/user-guide/latest/compatibility.html
b/user-guide/latest/compatibility.html
index ee6f8181d..b55c99b35 100644
--- a/user-guide/latest/compatibility.html
+++ b/user-guide/latest/compatibility.html
@@ -548,6 +548,9 @@ However, one exception is comparison. Spark does not
normalize NaN and zero when
because they are handled well in Spark (e.g., <code class="docutils literal
notranslate"><span class="pre">SQLOrderingUtil.compareFloats</span></code>).
But the comparison
functions of arrow-rs used by DataFusion do not normalize NaN and zero (e.g.,
<a class="reference external"
href="https://docs.rs/arrow/latest/arrow/compute/kernels/cmp/fn.eq.html#">arrow::compute::kernels::cmp::eq</a>).
So Comet will add additional normalization expression of NaN and zero for
comparison.</p>
+<p>Sorting on floating-point data types (or complex types containing
floating-point values) is not compatible with
+Spark if the data contains both zero and negative zero. This is likely an edge
case that is not of concern for many users
+and sorting on floating-point data can be enabled by setting <code
class="docutils literal notranslate"><span
class="pre">spark.comet.expression.SortOrder.allowIncompatible=true</span></code>.</p>
<p>There is a known bug with using count(distinct) within aggregate queries,
where each NaN value will be counted
separately <a class="reference external"
href="https://github.com/apache/datafusion-comet/issues/1824">#1824</a>.</p>
</section>
diff --git a/user-guide/latest/tuning.html b/user-guide/latest/tuning.html
index caac28b3d..952dc3e9e 100644
--- a/user-guide/latest/tuning.html
+++ b/user-guide/latest/tuning.html
@@ -523,6 +523,12 @@ providing better performance than Spark for half the
resource</p></li>
<p>It may be possible to reduce Comet’s memory overhead by reducing batch
sizes or increasing number of partitions.</p>
</section>
</section>
+<section id="optimizing-sorting-on-floating-point-values">
+<h2>Optimizing Sorting on Floating-Point Values<a class="headerlink"
href="#optimizing-sorting-on-floating-point-values" title="Link to this
heading">#</a></h2>
+<p>Sorting on floating-point data types (or complex types containing
floating-point values) is not compatible with
+Spark if the data contains both zero and negative zero. This is likely an edge
case that is not of concern for many users
+and sorting on floating-point data can be enabled by setting <code
class="docutils literal notranslate"><span
class="pre">spark.comet.expression.SortOrder.allowIncompatible=true</span></code>.</p>
+</section>
<section id="optimizing-joins">
<h2>Optimizing Joins<a class="headerlink" href="#optimizing-joins" title="Link
to this heading">#</a></h2>
<p>Spark often chooses <code class="docutils literal notranslate"><span
class="pre">SortMergeJoin</span></code> over <code class="docutils literal
notranslate"><span class="pre">ShuffledHashJoin</span></code> for stability
reasons. If the build-side of a
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]