Re: [PR] docs: add documentation for complex type cast support [datafusion-comet]

via GitHub Fri, 26 Dec 2025 15:28:20 -0800


coderfender commented on code in PR #2989:
URL: https://github.com/apache/datafusion-comet/pull/2989#discussion_r2648750195



##########
docs/source/user-guide/latest/compatibility.md:
##########
@@ -20,70 +20,105 @@ under the License.
 # Compatibility Guide
 
 Comet aims to provide consistent results with the version of Apache Spark that 
is being used.
+This guide documents known differences and limitations.
 
-This guide offers information about areas of functionality where there are 
known differences.
+---
 
 ## Parquet
 
 Comet has the following limitations when reading Parquet files:
 
-- Comet does not support reading decimals encoded in binary format.
-- No support for default values that are nested types (e.g., maps, arrays, 
structs). Literal default values are supported.
+- Decimal values encoded using binary format are not supported.
+- Default values for nested types (e.g., maps, arrays, structs) are not 
supported.
+  Literal default values are supported.
+
+---
 
 ## ANSI Mode
 
-Comet will fall back to Spark for the following expressions when ANSI mode is 
enabled. These expressions can be enabled by setting
-`spark.comet.expression.EXPRNAME.allowIncompatible=true`, where `EXPRNAME` is 
the Spark expression class name. See
-the [Comet Supported Expressions Guide](expressions.md) for more information 
on this configuration setting.
+When ANSI mode is enabled, Comet falls back to Spark for some expressions to 
ensure correctness.
+
+Affected expressions include:
+
+- `Average`
+- `Cast` (in some cases)
+
+Incompatible expressions can be forced to run natively using:
+
+```text
+spark.comet.expression.EXPRNAME.allowIncompatible=true
+```
+
+This is not recommended for production use.
 
-- Average
-- Cast (in some cases)
+Tracking issue for full ANSI support:
+https://github.com/apache/datafusion-comet/issues/313
 
-There is an [epic](https://github.com/apache/datafusion-comet/issues/313) 
where we are tracking the work to fully implement ANSI support.
+---
 
 ## Floating-point Number Comparison
 
-Spark normalizes NaN and zero for floating point numbers for several cases. 
See `NormalizeFloatingNumbers` optimization rule in Spark.
-However, one exception is comparison. Spark does not normalize NaN and zero 
when comparing values
-because they are handled well in Spark (e.g., 
`SQLOrderingUtil.compareFloats`). But the comparison
-functions of arrow-rs used by DataFusion do not normalize NaN and zero (e.g., 
[arrow::compute::kernels::cmp::eq](https://docs.rs/arrow/latest/arrow/compute/kernels/cmp/fn.eq.html#)).
-So Comet adds additional normalization expression of NaN and zero for 
comparisons, and may still have differences
-to Spark in some cases, especially when the data contains both positive and 
negative zero. This is likely an edge
-case that is not of concern for many users. If it is a concern, setting 
`spark.comet.exec.strictFloatingPoint=true`
-will make relevant operations fall back to Spark.
+Spark normalizes NaN and signed zero in most cases, except during comparisons.
+DataFusion (via Arrow) does not apply the same normalization.
+
+Comet adds additional normalization logic but may still differ in rare edge 
cases,
+especially when both positive and negative zero are present.
+
+To force Spark execution for strict correctness:
+
+```text
+spark.comet.exec.strictFloatingPoint=true
+```
+
+---
 
 ## Incompatible Expressions
 
-Expressions that are not 100% Spark-compatible will fall back to Spark by 
default and can be enabled by setting
-`spark.comet.expression.EXPRNAME.allowIncompatible=true`, where `EXPRNAME` is 
the Spark expression class name. See
-the [Comet Supported Expressions Guide](expressions.md) for more information 
on this configuration setting.
+Expressions that are not fully Spark-compatible fall back to Spark by default.
+They may be enabled using:
+
+```text
+spark.comet.expression.EXPRNAME.allowIncompatible=true
+```
+
+See the [Comet Supported Expressions Guide](expressions.md) for details.
+
+---
 
 ## Regular Expressions
 
-Comet uses the Rust regexp crate for evaluating regular expressions, and this 
has different behavior from Java's
-regular expression engine. Comet will fall back to Spark for patterns that are 
known to produce different results, but
-this can be overridden by setting `spark.comet.regexp.allowIncompatible=true`.
+Comet uses Rust's regex crate, which differs from Java's regex engine used by 
Spark.
+Patterns known to produce different results will fall back to Spark.
+
+This behavior can be overridden using:
+
+```text
+spark.comet.regexp.allowIncompatible=true
+```
+
+---
 
 ## Window Functions
 
-Comet's support for window functions is incomplete and known to be incorrect. 
It is disabled by default and
-should not be used in production. The feature will be enabled in a future 
release. Tracking issue: 
[#2721](https://github.com/apache/datafusion-comet/issues/2721).
+Window functions are disabled by default.
+Support is incomplete and may produce incorrect results.
+
+Tracking issue:
+https://github.com/apache/datafusion-comet/issues/2721
+
+---
 
 ## Cast
 
-Cast operations in Comet fall into three levels of support:
+Cast operations are classified into three categories:
 
-- **Compatible**: The results match Apache Spark
-- **Incompatible**: The results may match Apache Spark for some inputs, but 
there are known issues where some inputs
-  will result in incorrect results or exceptions. The query stage will fall 
back to Spark by default. Setting
-  `spark.comet.expression.Cast.allowIncompatible=true` will allow all 
incompatible casts to run natively in Comet, but this is not
-  recommended for production use.
-- **Unsupported**: Comet does not provide a native version of this cast 
expression and the query stage will fall back to
-  Spark.
+- **Compatible** – Results match Spark
+- **Incompatible** – May differ for some inputs; fallback by default
+- **Unsupported** – Always falls back to Spark
 
 ### Compatible Casts
 
-The following cast operations are generally compatible with Spark except for 
the differences noted here.
+The following casts are generally compatible with Spark, with noted 
differences.

Review Comment:
   Not sure if this intended ?



##########
docs/source/user-guide/latest/compatibility.md:
##########
@@ -20,70 +20,105 @@ under the License.
 # Compatibility Guide
 
 Comet aims to provide consistent results with the version of Apache Spark that 
is being used.
+This guide documents known differences and limitations.
 
-This guide offers information about areas of functionality where there are 
known differences.
+---
 
 ## Parquet
 
 Comet has the following limitations when reading Parquet files:
 
-- Comet does not support reading decimals encoded in binary format.
-- No support for default values that are nested types (e.g., maps, arrays, 
structs). Literal default values are supported.
+- Decimal values encoded using binary format are not supported.
+- Default values for nested types (e.g., maps, arrays, structs) are not 
supported.
+  Literal default values are supported.
+
+---
 
 ## ANSI Mode
 
-Comet will fall back to Spark for the following expressions when ANSI mode is 
enabled. These expressions can be enabled by setting
-`spark.comet.expression.EXPRNAME.allowIncompatible=true`, where `EXPRNAME` is 
the Spark expression class name. See
-the [Comet Supported Expressions Guide](expressions.md) for more information 
on this configuration setting.
+When ANSI mode is enabled, Comet falls back to Spark for some expressions to 
ensure correctness.
+
+Affected expressions include:
+
+- `Average`
+- `Cast` (in some cases)
+
+Incompatible expressions can be forced to run natively using:
+
+```text
+spark.comet.expression.EXPRNAME.allowIncompatible=true
+```
+
+This is not recommended for production use.
 
-- Average
-- Cast (in some cases)
+Tracking issue for full ANSI support:
+https://github.com/apache/datafusion-comet/issues/313

Review Comment:
   Not sure if this intended ?



##########
docs/source/user-guide/latest/compatibility.md:
##########
@@ -161,7 +196,7 @@ The following cast operations are generally compatible with 
Spark except for the
 | string | float |  |
 | string | double |  |
 | string | binary |  |
-| string | date | Only supports years between 262143 BC and 262142 AD |
+| string | date | Years supported: 262143 BC to 262142 AD |

Review Comment:
   Not sure if this intended ?



##########
docs/source/user-guide/latest/compatibility.md:
##########
@@ -136,14 +171,14 @@ The following cast operations are generally compatible 
with Spark except for the
 | float | integer |  |
 | float | long |  |
 | float | double |  |
-| float | string | There can be differences in precision. For example, the 
input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45 |
+| float | string | Precision differences possible |
 | double | boolean |  |
 | double | byte |  |
 | double | short |  |
 | double | integer |  |
 | double | long |  |
 | double | float |  |
-| double | string | There can be differences in precision. For example, the 
input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45 |
+| double | string | Precision differences possible |

Review Comment:
   Not sure if this intended ?



##########
docs/source/user-guide/latest/compatibility.md:
##########
@@ -136,14 +171,14 @@ The following cast operations are generally compatible 
with Spark except for the
 | float | integer |  |
 | float | long |  |
 | float | double |  |
-| float | string | There can be differences in precision. For example, the 
input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45 |
+| float | string | Precision differences possible |

Review Comment:
   Not sure if this intended ?



##########
docs/source/user-guide/latest/compatibility.md:
##########
@@ -20,70 +20,105 @@ under the License.
 # Compatibility Guide
 
 Comet aims to provide consistent results with the version of Apache Spark that 
is being used.
+This guide documents known differences and limitations.
 
-This guide offers information about areas of functionality where there are 
known differences.
+---
 
 ## Parquet
 
 Comet has the following limitations when reading Parquet files:
 
-- Comet does not support reading decimals encoded in binary format.
-- No support for default values that are nested types (e.g., maps, arrays, 
structs). Literal default values are supported.
+- Decimal values encoded using binary format are not supported.
+- Default values for nested types (e.g., maps, arrays, structs) are not 
supported.
+  Literal default values are supported.

Review Comment:
   Not sure if  we are covering Decimal -> Binary format documentation in this 
change ?



##########
docs/source/user-guide/latest/compatibility.md:
##########
@@ -152,7 +187,7 @@ The following cast operations are generally compatible with 
Spark except for the
 | decimal | float |  |
 | decimal | double |  |
 | decimal | decimal |  |
-| decimal | string | There can be formatting differences in some case due to 
Spark using scientific notation where Comet does not |
+| decimal | string | Formatting differences possible |

Review Comment:
   Not sure if this intended ?



##########
docs/source/user-guide/latest/compatibility.md:
##########
@@ -20,70 +20,105 @@ under the License.
 # Compatibility Guide
 
 Comet aims to provide consistent results with the version of Apache Spark that 
is being used.
+This guide documents known differences and limitations.
 
-This guide offers information about areas of functionality where there are 
known differences.
+---
 
 ## Parquet
 
 Comet has the following limitations when reading Parquet files:
 
-- Comet does not support reading decimals encoded in binary format.
-- No support for default values that are nested types (e.g., maps, arrays, 
structs). Literal default values are supported.
+- Decimal values encoded using binary format are not supported.
+- Default values for nested types (e.g., maps, arrays, structs) are not 
supported.
+  Literal default values are supported.
+
+---
 
 ## ANSI Mode
 
-Comet will fall back to Spark for the following expressions when ANSI mode is 
enabled. These expressions can be enabled by setting
-`spark.comet.expression.EXPRNAME.allowIncompatible=true`, where `EXPRNAME` is 
the Spark expression class name. See
-the [Comet Supported Expressions Guide](expressions.md) for more information 
on this configuration setting.
+When ANSI mode is enabled, Comet falls back to Spark for some expressions to 
ensure correctness.
+
+Affected expressions include:
+
+- `Average`
+- `Cast` (in some cases)
+
+Incompatible expressions can be forced to run natively using:
+
+```text
+spark.comet.expression.EXPRNAME.allowIncompatible=true
+```
+
+This is not recommended for production use.
 
-- Average
-- Cast (in some cases)
+Tracking issue for full ANSI support:
+https://github.com/apache/datafusion-comet/issues/313
 
-There is an [epic](https://github.com/apache/datafusion-comet/issues/313) 
where we are tracking the work to fully implement ANSI support.
+---
 
 ## Floating-point Number Comparison
 
-Spark normalizes NaN and zero for floating point numbers for several cases. 
See `NormalizeFloatingNumbers` optimization rule in Spark.
-However, one exception is comparison. Spark does not normalize NaN and zero 
when comparing values
-because they are handled well in Spark (e.g., 
`SQLOrderingUtil.compareFloats`). But the comparison
-functions of arrow-rs used by DataFusion do not normalize NaN and zero (e.g., 
[arrow::compute::kernels::cmp::eq](https://docs.rs/arrow/latest/arrow/compute/kernels/cmp/fn.eq.html#)).
-So Comet adds additional normalization expression of NaN and zero for 
comparisons, and may still have differences
-to Spark in some cases, especially when the data contains both positive and 
negative zero. This is likely an edge
-case that is not of concern for many users. If it is a concern, setting 
`spark.comet.exec.strictFloatingPoint=true`
-will make relevant operations fall back to Spark.
+Spark normalizes NaN and signed zero in most cases, except during comparisons.
+DataFusion (via Arrow) does not apply the same normalization.
+
+Comet adds additional normalization logic but may still differ in rare edge 
cases,
+especially when both positive and negative zero are present.
+
+To force Spark execution for strict correctness:
+
+```text
+spark.comet.exec.strictFloatingPoint=true
+```
+
+---
 
 ## Incompatible Expressions
 
-Expressions that are not 100% Spark-compatible will fall back to Spark by 
default and can be enabled by setting
-`spark.comet.expression.EXPRNAME.allowIncompatible=true`, where `EXPRNAME` is 
the Spark expression class name. See
-the [Comet Supported Expressions Guide](expressions.md) for more information 
on this configuration setting.
+Expressions that are not fully Spark-compatible fall back to Spark by default.
+They may be enabled using:
+
+```text
+spark.comet.expression.EXPRNAME.allowIncompatible=true
+```
+
+See the [Comet Supported Expressions Guide](expressions.md) for details.
+
+---
 
 ## Regular Expressions
 
-Comet uses the Rust regexp crate for evaluating regular expressions, and this 
has different behavior from Java's
-regular expression engine. Comet will fall back to Spark for patterns that are 
known to produce different results, but
-this can be overridden by setting `spark.comet.regexp.allowIncompatible=true`.
+Comet uses Rust's regex crate, which differs from Java's regex engine used by 
Spark.
+Patterns known to produce different results will fall back to Spark.
+
+This behavior can be overridden using:
+
+```text
+spark.comet.regexp.allowIncompatible=true
+```
+
+---
 
 ## Window Functions
 
-Comet's support for window functions is incomplete and known to be incorrect. 
It is disabled by default and
-should not be used in production. The feature will be enabled in a future 
release. Tracking issue: 
[#2721](https://github.com/apache/datafusion-comet/issues/2721).
+Window functions are disabled by default.
+Support is incomplete and may produce incorrect results.
+
+Tracking issue:
+https://github.com/apache/datafusion-comet/issues/2721
+

Review Comment:
   Not sure if this intended ?



##########
docs/source/user-guide/latest/compatibility.md:
##########
@@ -20,70 +20,105 @@ under the License.
 # Compatibility Guide
 
 Comet aims to provide consistent results with the version of Apache Spark that 
is being used.
+This guide documents known differences and limitations.
 
-This guide offers information about areas of functionality where there are 
known differences.
+---
 
 ## Parquet
 
 Comet has the following limitations when reading Parquet files:
 
-- Comet does not support reading decimals encoded in binary format.
-- No support for default values that are nested types (e.g., maps, arrays, 
structs). Literal default values are supported.
+- Decimal values encoded using binary format are not supported.
+- Default values for nested types (e.g., maps, arrays, structs) are not 
supported.

Review Comment:
   We might want to keep the changes limited to the issue being addressed IMHO



##########
docs/source/user-guide/latest/compatibility.md:
##########
@@ -20,70 +20,105 @@ under the License.
 # Compatibility Guide
 
 Comet aims to provide consistent results with the version of Apache Spark that 
is being used.
+This guide documents known differences and limitations.
 
-This guide offers information about areas of functionality where there are 
known differences.
+---
 
 ## Parquet
 
 Comet has the following limitations when reading Parquet files:
 
-- Comet does not support reading decimals encoded in binary format.
-- No support for default values that are nested types (e.g., maps, arrays, 
structs). Literal default values are supported.
+- Decimal values encoded using binary format are not supported.
+- Default values for nested types (e.g., maps, arrays, structs) are not 
supported.
+  Literal default values are supported.
+
+---
 
 ## ANSI Mode
 
-Comet will fall back to Spark for the following expressions when ANSI mode is 
enabled. These expressions can be enabled by setting
-`spark.comet.expression.EXPRNAME.allowIncompatible=true`, where `EXPRNAME` is 
the Spark expression class name. See
-the [Comet Supported Expressions Guide](expressions.md) for more information 
on this configuration setting.
+When ANSI mode is enabled, Comet falls back to Spark for some expressions to 
ensure correctness.
+
+Affected expressions include:
+
+- `Average`
+- `Cast` (in some cases)
+
+Incompatible expressions can be forced to run natively using:
+
+```text
+spark.comet.expression.EXPRNAME.allowIncompatible=true
+```
+
+This is not recommended for production use.
 
-- Average
-- Cast (in some cases)
+Tracking issue for full ANSI support:
+https://github.com/apache/datafusion-comet/issues/313
 
-There is an [epic](https://github.com/apache/datafusion-comet/issues/313) 
where we are tracking the work to fully implement ANSI support.
+---
 
 ## Floating-point Number Comparison
 
-Spark normalizes NaN and zero for floating point numbers for several cases. 
See `NormalizeFloatingNumbers` optimization rule in Spark.
-However, one exception is comparison. Spark does not normalize NaN and zero 
when comparing values
-because they are handled well in Spark (e.g., 
`SQLOrderingUtil.compareFloats`). But the comparison
-functions of arrow-rs used by DataFusion do not normalize NaN and zero (e.g., 
[arrow::compute::kernels::cmp::eq](https://docs.rs/arrow/latest/arrow/compute/kernels/cmp/fn.eq.html#)).
-So Comet adds additional normalization expression of NaN and zero for 
comparisons, and may still have differences
-to Spark in some cases, especially when the data contains both positive and 
negative zero. This is likely an edge
-case that is not of concern for many users. If it is a concern, setting 
`spark.comet.exec.strictFloatingPoint=true`
-will make relevant operations fall back to Spark.
+Spark normalizes NaN and signed zero in most cases, except during comparisons.
+DataFusion (via Arrow) does not apply the same normalization.
+
+Comet adds additional normalization logic but may still differ in rare edge 
cases,
+especially when both positive and negative zero are present.
+
+To force Spark execution for strict correctness:
+
+```text
+spark.comet.exec.strictFloatingPoint=true
+```
+
+---
 
 ## Incompatible Expressions
 
-Expressions that are not 100% Spark-compatible will fall back to Spark by 
default and can be enabled by setting
-`spark.comet.expression.EXPRNAME.allowIncompatible=true`, where `EXPRNAME` is 
the Spark expression class name. See
-the [Comet Supported Expressions Guide](expressions.md) for more information 
on this configuration setting.
+Expressions that are not fully Spark-compatible fall back to Spark by default.
+They may be enabled using:
+
+```text
+spark.comet.expression.EXPRNAME.allowIncompatible=true
+```
+
+See the [Comet Supported Expressions Guide](expressions.md) for details.
+
+---
 
 ## Regular Expressions
 
-Comet uses the Rust regexp crate for evaluating regular expressions, and this 
has different behavior from Java's
-regular expression engine. Comet will fall back to Spark for patterns that are 
known to produce different results, but
-this can be overridden by setting `spark.comet.regexp.allowIncompatible=true`.
+Comet uses Rust's regex crate, which differs from Java's regex engine used by 
Spark.
+Patterns known to produce different results will fall back to Spark.
+
+This behavior can be overridden using:
+
+```text
+spark.comet.regexp.allowIncompatible=true
+```
+
+---
 
 ## Window Functions
 
-Comet's support for window functions is incomplete and known to be incorrect. 
It is disabled by default and
-should not be used in production. The feature will be enabled in a future 
release. Tracking issue: 
[#2721](https://github.com/apache/datafusion-comet/issues/2721).
+Window functions are disabled by default.
+Support is incomplete and may produce incorrect results.
+
+Tracking issue:
+https://github.com/apache/datafusion-comet/issues/2721
+
+---
 
 ## Cast
 
-Cast operations in Comet fall into three levels of support:
+Cast operations are classified into three categories:
 
-- **Compatible**: The results match Apache Spark
-- **Incompatible**: The results may match Apache Spark for some inputs, but 
there are known issues where some inputs
-  will result in incorrect results or exceptions. The query stage will fall 
back to Spark by default. Setting
-  `spark.comet.expression.Cast.allowIncompatible=true` will allow all 
incompatible casts to run natively in Comet, but this is not
-  recommended for production use.
-- **Unsupported**: Comet does not provide a native version of this cast 
expression and the query stage will fall back to
-  Spark.
+- **Compatible** – Results match Spark
+- **Incompatible** – May differ for some inputs; fallback by default
+- **Unsupported** – Always falls back to Spark

Review Comment:
   Is there a reason we removed the additional comet conf here ?



##########
docs/source/user-guide/latest/compatibility.md:
##########
@@ -20,70 +20,105 @@ under the License.
 # Compatibility Guide
 
 Comet aims to provide consistent results with the version of Apache Spark that 
is being used.
+This guide documents known differences and limitations.
 
-This guide offers information about areas of functionality where there are 
known differences.
+---
 
 ## Parquet
 
 Comet has the following limitations when reading Parquet files:
 
-- Comet does not support reading decimals encoded in binary format.
-- No support for default values that are nested types (e.g., maps, arrays, 
structs). Literal default values are supported.
+- Decimal values encoded using binary format are not supported.
+- Default values for nested types (e.g., maps, arrays, structs) are not 
supported.
+  Literal default values are supported.
+
+---
 
 ## ANSI Mode
 
-Comet will fall back to Spark for the following expressions when ANSI mode is 
enabled. These expressions can be enabled by setting
-`spark.comet.expression.EXPRNAME.allowIncompatible=true`, where `EXPRNAME` is 
the Spark expression class name. See
-the [Comet Supported Expressions Guide](expressions.md) for more information 
on this configuration setting.
+When ANSI mode is enabled, Comet falls back to Spark for some expressions to 
ensure correctness.
+
+Affected expressions include:
+
+- `Average`
+- `Cast` (in some cases)
+
+Incompatible expressions can be forced to run natively using:
+
+```text
+spark.comet.expression.EXPRNAME.allowIncompatible=true
+```
+
+This is not recommended for production use.
 
-- Average
-- Cast (in some cases)
+Tracking issue for full ANSI support:
+https://github.com/apache/datafusion-comet/issues/313
 
-There is an [epic](https://github.com/apache/datafusion-comet/issues/313) 
where we are tracking the work to fully implement ANSI support.
+---
 
 ## Floating-point Number Comparison
 
-Spark normalizes NaN and zero for floating point numbers for several cases. 
See `NormalizeFloatingNumbers` optimization rule in Spark.
-However, one exception is comparison. Spark does not normalize NaN and zero 
when comparing values
-because they are handled well in Spark (e.g., 
`SQLOrderingUtil.compareFloats`). But the comparison
-functions of arrow-rs used by DataFusion do not normalize NaN and zero (e.g., 
[arrow::compute::kernels::cmp::eq](https://docs.rs/arrow/latest/arrow/compute/kernels/cmp/fn.eq.html#)).
-So Comet adds additional normalization expression of NaN and zero for 
comparisons, and may still have differences
-to Spark in some cases, especially when the data contains both positive and 
negative zero. This is likely an edge
-case that is not of concern for many users. If it is a concern, setting 
`spark.comet.exec.strictFloatingPoint=true`
-will make relevant operations fall back to Spark.
+Spark normalizes NaN and signed zero in most cases, except during comparisons.
+DataFusion (via Arrow) does not apply the same normalization.
+
+Comet adds additional normalization logic but may still differ in rare edge 
cases,
+especially when both positive and negative zero are present.
+
+To force Spark execution for strict correctness:
+
+```text
+spark.comet.exec.strictFloatingPoint=true
+```
+
+---
 
 ## Incompatible Expressions
 
-Expressions that are not 100% Spark-compatible will fall back to Spark by 
default and can be enabled by setting
-`spark.comet.expression.EXPRNAME.allowIncompatible=true`, where `EXPRNAME` is 
the Spark expression class name. See
-the [Comet Supported Expressions Guide](expressions.md) for more information 
on this configuration setting.
+Expressions that are not fully Spark-compatible fall back to Spark by default.
+They may be enabled using:
+
+```text
+spark.comet.expression.EXPRNAME.allowIncompatible=true

Review Comment:
   Not sure if this intended ?



##########
docs/source/user-guide/latest/compatibility.md:
##########
@@ -20,70 +20,105 @@ under the License.
 # Compatibility Guide
 
 Comet aims to provide consistent results with the version of Apache Spark that 
is being used.
+This guide documents known differences and limitations.
 
-This guide offers information about areas of functionality where there are 
known differences.
+---
 
 ## Parquet
 
 Comet has the following limitations when reading Parquet files:
 
-- Comet does not support reading decimals encoded in binary format.
-- No support for default values that are nested types (e.g., maps, arrays, 
structs). Literal default values are supported.
+- Decimal values encoded using binary format are not supported.
+- Default values for nested types (e.g., maps, arrays, structs) are not 
supported.
+  Literal default values are supported.
+
+---
 
 ## ANSI Mode
 
-Comet will fall back to Spark for the following expressions when ANSI mode is 
enabled. These expressions can be enabled by setting
-`spark.comet.expression.EXPRNAME.allowIncompatible=true`, where `EXPRNAME` is 
the Spark expression class name. See
-the [Comet Supported Expressions Guide](expressions.md) for more information 
on this configuration setting.
+When ANSI mode is enabled, Comet falls back to Spark for some expressions to 
ensure correctness.
+
+Affected expressions include:
+
+- `Average`
+- `Cast` (in some cases)
+
+Incompatible expressions can be forced to run natively using:
+
+```text
+spark.comet.expression.EXPRNAME.allowIncompatible=true
+```
+
+This is not recommended for production use.
 
-- Average
-- Cast (in some cases)
+Tracking issue for full ANSI support:
+https://github.com/apache/datafusion-comet/issues/313
 
-There is an [epic](https://github.com/apache/datafusion-comet/issues/313) 
where we are tracking the work to fully implement ANSI support.
+---
 
 ## Floating-point Number Comparison
 
-Spark normalizes NaN and zero for floating point numbers for several cases. 
See `NormalizeFloatingNumbers` optimization rule in Spark.
-However, one exception is comparison. Spark does not normalize NaN and zero 
when comparing values
-because they are handled well in Spark (e.g., 
`SQLOrderingUtil.compareFloats`). But the comparison
-functions of arrow-rs used by DataFusion do not normalize NaN and zero (e.g., 
[arrow::compute::kernels::cmp::eq](https://docs.rs/arrow/latest/arrow/compute/kernels/cmp/fn.eq.html#)).
-So Comet adds additional normalization expression of NaN and zero for 
comparisons, and may still have differences
-to Spark in some cases, especially when the data contains both positive and 
negative zero. This is likely an edge
-case that is not of concern for many users. If it is a concern, setting 
`spark.comet.exec.strictFloatingPoint=true`
-will make relevant operations fall back to Spark.
+Spark normalizes NaN and signed zero in most cases, except during comparisons.
+DataFusion (via Arrow) does not apply the same normalization.
+
+Comet adds additional normalization logic but may still differ in rare edge 
cases,
+especially when both positive and negative zero are present.
+
+To force Spark execution for strict correctness:
+
+```text

Review Comment:
   Not sure if this intended ?



##########
docs/source/user-guide/latest/compatibility.md:
##########
@@ -20,70 +20,105 @@ under the License.
 # Compatibility Guide
 
 Comet aims to provide consistent results with the version of Apache Spark that 
is being used.
+This guide documents known differences and limitations.
 
-This guide offers information about areas of functionality where there are 
known differences.
+---
 
 ## Parquet
 
 Comet has the following limitations when reading Parquet files:
 
-- Comet does not support reading decimals encoded in binary format.
-- No support for default values that are nested types (e.g., maps, arrays, 
structs). Literal default values are supported.
+- Decimal values encoded using binary format are not supported.
+- Default values for nested types (e.g., maps, arrays, structs) are not 
supported.
+  Literal default values are supported.
+
+---
 
 ## ANSI Mode
 
-Comet will fall back to Spark for the following expressions when ANSI mode is 
enabled. These expressions can be enabled by setting
-`spark.comet.expression.EXPRNAME.allowIncompatible=true`, where `EXPRNAME` is 
the Spark expression class name. See
-the [Comet Supported Expressions Guide](expressions.md) for more information 
on this configuration setting.
+When ANSI mode is enabled, Comet falls back to Spark for some expressions to 
ensure correctness.
+
+Affected expressions include:
+
+- `Average`
+- `Cast` (in some cases)
+
+Incompatible expressions can be forced to run natively using:
+
+```text
+spark.comet.expression.EXPRNAME.allowIncompatible=true
+```
+
+This is not recommended for production use.
 
-- Average
-- Cast (in some cases)
+Tracking issue for full ANSI support:
+https://github.com/apache/datafusion-comet/issues/313
 
-There is an [epic](https://github.com/apache/datafusion-comet/issues/313) 
where we are tracking the work to fully implement ANSI support.
+---
 
 ## Floating-point Number Comparison
 
-Spark normalizes NaN and zero for floating point numbers for several cases. 
See `NormalizeFloatingNumbers` optimization rule in Spark.
-However, one exception is comparison. Spark does not normalize NaN and zero 
when comparing values
-because they are handled well in Spark (e.g., 
`SQLOrderingUtil.compareFloats`). But the comparison
-functions of arrow-rs used by DataFusion do not normalize NaN and zero (e.g., 
[arrow::compute::kernels::cmp::eq](https://docs.rs/arrow/latest/arrow/compute/kernels/cmp/fn.eq.html#)).
-So Comet adds additional normalization expression of NaN and zero for 
comparisons, and may still have differences
-to Spark in some cases, especially when the data contains both positive and 
negative zero. This is likely an edge
-case that is not of concern for many users. If it is a concern, setting 
`spark.comet.exec.strictFloatingPoint=true`
-will make relevant operations fall back to Spark.
+Spark normalizes NaN and signed zero in most cases, except during comparisons.
+DataFusion (via Arrow) does not apply the same normalization.
+
+Comet adds additional normalization logic but may still differ in rare edge 
cases,
+especially when both positive and negative zero are present.
+
+To force Spark execution for strict correctness:
+
+```text
+spark.comet.exec.strictFloatingPoint=true
+```
+
+---
 
 ## Incompatible Expressions
 
-Expressions that are not 100% Spark-compatible will fall back to Spark by 
default and can be enabled by setting
-`spark.comet.expression.EXPRNAME.allowIncompatible=true`, where `EXPRNAME` is 
the Spark expression class name. See
-the [Comet Supported Expressions Guide](expressions.md) for more information 
on this configuration setting.
+Expressions that are not fully Spark-compatible fall back to Spark by default.
+They may be enabled using:
+
+```text
+spark.comet.expression.EXPRNAME.allowIncompatible=true
+```
+
+See the [Comet Supported Expressions Guide](expressions.md) for details.
+
+---
 
 ## Regular Expressions
 
-Comet uses the Rust regexp crate for evaluating regular expressions, and this 
has different behavior from Java's
-regular expression engine. Comet will fall back to Spark for patterns that are 
known to produce different results, but
-this can be overridden by setting `spark.comet.regexp.allowIncompatible=true`.
+Comet uses Rust's regex crate, which differs from Java's regex engine used by 
Spark.
+Patterns known to produce different results will fall back to Spark.
+

Review Comment:
   Not sure if this intended ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] docs: add documentation for complex type cast support [datafusion-comet]

Reply via email to