andygrove opened a new pull request, #4486:
URL: https://github.com/apache/datafusion-comet/pull/4486
## Which issue does this PR close?
Closes #.
## Rationale for this change
Continuation of the per-category expression audit. Same pattern as #4483
(array), #4480 (predicate), #4479 (bitwise), #4478 (map), #4476 (hash), #4475
(conditional), #4474 (misc), #4473 (collection), #4470 (json), #4469 (struct),
using the updated `audit-comet-expression` skill in #4468.
## What changes are included in this PR?
### Support-doc audit notes
Add per-version audit sub-bullets to all 60 supported math SQL function
names (`%`, `*`, `+`, `-`, `/`, `abs`, all trig and inverse-trig, all
log/exp/pow/sqrt/cbrt, `bin`, `hex`, `unhex`, `greatest`, `least`,
`ceil`/`ceiling`/`floor`/`round`/`rint`/`sign`/`signum`,
`div`/`mod`/`negative`/`positive`/`pi`,
`try_add`/`try_divide`/`try_multiply`/`try_subtract`, `width_bucket`). Add
4.1.1 audit line to the already-audited `factorial`. `rand` / `randn`
cross-reference the misc audit (#4474); `shiftleft` cross-references the
bitwise audit (#4479).
Highlights from the cross-version review:
- All trig and exp/log helpers are byte-for-byte stable across the four
Spark versions; the only systemic change in 4.0 is the `NullIntolerant` trait
-> `nullIntolerant: Boolean` field refactor.
- 4.0 also adds the `DefaultStringProducingExpression` trait on `Bin` /
`Hex` / `Unhex` and widens their `StringType` inputs to
`StringTypeWithCollation`.
- `UnaryPositive` becomes `RuntimeReplaceable` in 4.0; in 3.4/3.5 `+col`
silently disables Comet for the enclosing projection because there is no
`CometUnaryPositive` serde.
- `CometRemainder` rejects `EvalMode.TRY`, so `try_mod` on Spark 4.0+ falls
back (filed as #4484).
- `Round` falls back for Float / Double because BigDecimal-via-toString
rounding cannot be precisely matched.
- `width_bucket` (Spark 3.5+) is wired via per-version `CometExprShim`
rather than a `CometExpressionSerde`, bypassing the support-level framework
(filed as #4485).
- `hex` / `unhex` do not propagate Spark 4.x collation; covered by the
umbrella #2190.
### Support-level consistency fixes
None in this PR. The audit surfaced several `convert`-time `withInfo`
fallbacks in `arithmetic.scala` (`CometAdd` / `CometSubtract` / `CometMultiply`
/ `CometDivide` / `CometIntegralDivide` / `CometRemainder` / `CometUnaryMinus`
/ `CometRound` rely on the default `Compatible(None)` and bail out from
`convert` for several cases). Lifting them into `getSupportLevel` /
`getUnsupportedReasons` is mechanical but touches an unusually load-bearing
file; deliberately deferred to follow-ups to keep this audit PR low-risk.
### Tracking issues filed for follow-up
- #4484 `try_mod` falls back to Spark because `CometRemainder` rejects
`EvalMode.TRY`.
- #4485 `width_bucket` bypasses the `CometExpressionSerde` framework.
### Audit process
Audited using the `audit-comet-expression` skill (4 Spark versions per
#4468), driven by 4 parallel agents covering arithmetic ops, trig + constants,
log/exp/power/rounding, and misc (bin/hex/etc).
## How are these changes tested?
- `make core` succeeds (no code changes; doc only).
- Existing test coverage in `CometExpressionSuite`,
`CometMathExpressionSuite`, `CometSqlFileTestSuite expressions/math/`, and
`expressions/string/{hex,unhex}.sql` remains unchanged.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]