andygrove opened a new pull request, #4483:
URL: https://github.com/apache/datafusion-comet/pull/4483

   ## Which issue does this PR close?
   
   Closes #.
   
   ## Rationale for this change
   
   Continuation of the per-category expression audit. Same pattern as #4480 
(predicate), #4479 (bitwise), #4478 (map), #4476 (hash), #4475 (conditional), 
#4474 (misc), #4473 (collection), #4470 (json), #4469 (struct), using the 
updated `audit-comet-expression` skill in #4468.
   
   ## What changes are included in this PR?
   
   ### Support-doc audit notes
   
   Add per-version audit sub-bullets to all 19 not-yet-audited array 
expressions (`array`, `array_append`, `array_compact`, `array_contains`, 
`array_distinct`, `array_except`, `array_join`, `array_max`, `array_min`, 
`array_position`, `array_remove`, `array_repeat`, `array_union`, 
`arrays_overlap`, `arrays_zip`, `element_at`, `flatten`, `get`, `sort_array`). 
Add 4.1.1 audit lines to the already-audited `array_insert` and 
`array_intersect`.
   
   Highlights from the cross-version review:
   
   - Spark 4.0 across the category does the `NullIntolerant` -> 
`nullIntolerant: Boolean` field refactor and (for elements participating in 
collation) widens `StringType` inputs to `StringTypeWithCollation`.
   - `ArrayAppend` becomes `RuntimeReplaceable` in 4.0 (rewrites to 
`ArrayInsert(arr, -1, elem)`); `CometArrayAppend` is unreachable in 4.0+.
   - `ArrayCompact` is always `RuntimeReplaceable`; Comet dispatches through 
`CometArrayFilter`.
   - `SortArray` 4.0 widens `ascendingOrder` from a `Literal` to any foldable 
boolean; `CometSortArray` still requires `Literal`.
   - `ElementAt` and `GetArrayItem` ANSI default flips to true in 4.0.
   
   ### Support-level consistency fixes (in `arrays.scala`)
   
   - `CometArrayExcept`: `Incompatible(None)` -> `Incompatible(Some(reason))` 
via a shared `private val` so the EXPLAIN message matches the 
compatibility-guide text.
   - `CometArrayJoin`: same fix.
   
   ### Tracking issues filed for follow-up
   
   - #4481 `array_distinct` / `array_union` / `array_except` do not 
canonicalize NaN / signed-zero like Spark's `SQLOpenHashSet`.
   - #4482 `array_max` / `array_min` disagree with Spark on NaN ordering.
   
   Existing #3178 already documents the `array_join` null-handling gap and is 
referenced from the `array_join` sub-bullet. Other findings from the audit 
(e.g. lifting convert-time restrictions into `getSupportLevel` for 
`CometArrayPosition`, `CometArrayRemove`, `CometElementAt`, `CometFlatten`, 
`CometSortArray.ascendingOrder`-foldable; relabeling 
`CometArrayCompact`/`CometArrayAppend` dead registrations) are noted in the doc 
but are too invasive for this audit PR; they can ride in follow-ups.
   
   ### Audit process
   
   Audited using the `audit-comet-expression` skill (4 Spark versions per 
#4468), driven by 3 parallel agents covering creation/access, set/search, and 
aggregate/misc groups.
   
   ## How are these changes tested?
   
   - `./mvnw test -Dsuites="org.apache.comet.CometArrayExpressionSuite" 
-Dtest=none` (39 tests pass)
   - `./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite 
expressions/array/" -Dtest=none` (34 tests pass)
   - `make core` succeeds.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to