oglego opened a new pull request, #4376:
URL: https://github.com/apache/datafusion-comet/pull/4376

   ## Which issue does this PR close?
   
   Closes #.
   
   ## Rationale for this change
   
   `cardinality` is a commonly used Spark SQL function (since 2.4.0) that was 
not yet supported natively by Comet, causing fallback to Spark execution. It 
returns the number of elements in an array or the number of key-value pairs in 
a map.
   
   Per the Spark docs, `cardinality` returns `-1` for null input only when both 
`spark.sql.legacy.sizeOfNull=true` AND `spark.sql.ansi.enabled=false`. With 
default settings it always returns `NULL` for null input. In Spark's 
implementation, `cardinality` is a direct alias for the `Size` expression with 
`legacySizeOfNull = false` hardcoded at parse time — the two-config gate never 
applies to `cardinality`.
   
   ## What changes are included in this PR?
   
   Adds native support for `cardinality(expr)` for both array and map inputs by 
extending the existing `CometSize` serde. There was no separate `Cardinality` 
class to wire — both `cardinality` and `size` produce a `Size` node in the 
logical plan. Two gaps in the Scala layer and one in the Rust layer prevented 
`cardinality` from offloading:
   
   1. `CometSize.getSupportLevel` was returning `Unsupported` for `MapType` 
inputs
   2. `CometSize.convert` was reading `SQLConf.get.legacySizeOfNull` (the 
global config) rather than `expr.legacySizeOfNull` (the instance field set at 
parse time), so `cardinality` could incorrectly return `-1` for `NULL` when the 
legacy config was enabled
   3. `SparkSizeFunc` in the Rust layer used `Signature::uniform` with exact 
type stubs, causing DataFusion's type coercion to reject real map columns with 
concrete inner field types
   
   Changes:
   
   - Extended `CometSize` in `arrays.scala` to accept `MapType` as 
`Compatible()`
   - Fixed `CometSize.convert` to use `expr.legacySizeOfNull` instead of the 
global config
   - Changed `SparkSizeFunc` signature to `Signature::any(1)` so real map and 
array types are accepted regardless of inner field types
   - Fixed null map entries in `SparkSizeFunc` to append `null` instead of 
hardcoded `-1`
   - Removed `spark_answer_only` from the `size(arr), size(m)` column query in 
`size.sql` since map inputs now run natively
   - Annotated the `Size` row in `expressions.md` to note it also covers the 
`cardinality` SQL alias
   
   The `implement-comet-expression` and `wire-datafusion-function` skills were 
used to scaffold this implementation.
   
   ## How are these changes tested?
   
   New Comet SQL Test at 
`spark/src/test/resources/sql-tests/expressions/array/cardinality.sql`. Covers:
   
   - Array column input
   - Map column input
   - Array and map columns in the same query
   - `NULL` array and map inputs via column reference (asserts `NULL` is 
returned, not `-1`)
   - Nested array column input (`array<array<int>>`)
   - Array-of-structs column input (`array<struct<a: int>>`)
   - Literal array and map arguments (`spark_answer_only` — 
`CreateArray`/`CreateMap` not yet natively supported by Comet, but correctness 
is verified against Spark)
   
   The existing `size.sql` test now also covers the map-input path natively 
(removed `spark_answer_only`).
   
   ```bash
   ./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite cardinality" 
-Dtest=none
   ./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite size" 
-Dtest=none
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to