andygrove opened a new pull request, #4483: URL: https://github.com/apache/datafusion-comet/pull/4483
## Which issue does this PR close? Closes #. ## Rationale for this change Continuation of the per-category expression audit. Same pattern as #4480 (predicate), #4479 (bitwise), #4478 (map), #4476 (hash), #4475 (conditional), #4474 (misc), #4473 (collection), #4470 (json), #4469 (struct), using the updated `audit-comet-expression` skill in #4468. ## What changes are included in this PR? ### Support-doc audit notes Add per-version audit sub-bullets to all 19 not-yet-audited array expressions (`array`, `array_append`, `array_compact`, `array_contains`, `array_distinct`, `array_except`, `array_join`, `array_max`, `array_min`, `array_position`, `array_remove`, `array_repeat`, `array_union`, `arrays_overlap`, `arrays_zip`, `element_at`, `flatten`, `get`, `sort_array`). Add 4.1.1 audit lines to the already-audited `array_insert` and `array_intersect`. Highlights from the cross-version review: - Spark 4.0 across the category does the `NullIntolerant` -> `nullIntolerant: Boolean` field refactor and (for elements participating in collation) widens `StringType` inputs to `StringTypeWithCollation`. - `ArrayAppend` becomes `RuntimeReplaceable` in 4.0 (rewrites to `ArrayInsert(arr, -1, elem)`); `CometArrayAppend` is unreachable in 4.0+. - `ArrayCompact` is always `RuntimeReplaceable`; Comet dispatches through `CometArrayFilter`. - `SortArray` 4.0 widens `ascendingOrder` from a `Literal` to any foldable boolean; `CometSortArray` still requires `Literal`. - `ElementAt` and `GetArrayItem` ANSI default flips to true in 4.0. ### Support-level consistency fixes (in `arrays.scala`) - `CometArrayExcept`: `Incompatible(None)` -> `Incompatible(Some(reason))` via a shared `private val` so the EXPLAIN message matches the compatibility-guide text. - `CometArrayJoin`: same fix. ### Tracking issues filed for follow-up - #4481 `array_distinct` / `array_union` / `array_except` do not canonicalize NaN / signed-zero like Spark's `SQLOpenHashSet`. - #4482 `array_max` / `array_min` disagree with Spark on NaN ordering. Existing #3178 already documents the `array_join` null-handling gap and is referenced from the `array_join` sub-bullet. Other findings from the audit (e.g. lifting convert-time restrictions into `getSupportLevel` for `CometArrayPosition`, `CometArrayRemove`, `CometElementAt`, `CometFlatten`, `CometSortArray.ascendingOrder`-foldable; relabeling `CometArrayCompact`/`CometArrayAppend` dead registrations) are noted in the doc but are too invasive for this audit PR; they can ride in follow-ups. ### Audit process Audited using the `audit-comet-expression` skill (4 Spark versions per #4468), driven by 3 parallel agents covering creation/access, set/search, and aggregate/misc groups. ## How are these changes tested? - `./mvnw test -Dsuites="org.apache.comet.CometArrayExpressionSuite" -Dtest=none` (39 tests pass) - `./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite expressions/array/" -Dtest=none` (34 tests pass) - `make core` succeeds. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
