andygrove opened a new pull request, #4778:
URL: https://github.com/apache/datafusion-comet/pull/4778
## Which issue does this PR close?
Closes #419.
## Rationale for this change
Comet already supports `unbase64` via the codegen dispatcher but had no
support for the inverse `base64` function, so any query using `base64` fell
back to Spark for the enclosing operator. This wires up `base64` so it stays in
Comet.
## What changes are included in this PR?
`base64` is routed through the JVM codegen dispatcher, which runs Spark's
own encoder inside the per-batch kernel so results match Spark exactly
(including the `chunkBase64` flag). The Catalyst representation differs by
Spark version, so both forms are handled:
- On Spark 3.5 and 4.0, `Base64` is `RuntimeReplaceable` and lowers to
`StaticInvoke(Base64.encode, ...)`. A new `("encode", classOf[Base64])` entry
in `CometStaticInvoke` dispatches it, mirroring the existing `decode` / AES
entries.
- On Spark 3.4, the `Base64` node survives to Comet and is handled directly
by a new `CometBase64` serde registered in `QueryPlanSerde`.
The expression reference (`expressions.md`) is updated to mark `base64` as
supported.
## How are these changes tested?
A new `base64.sql` file test mirrors the existing `unbase64.sql`, covering
column and literal inputs, the empty string, NULL, and a `base64`/`unbase64`
round trip. The test's `query` blocks use `checkSparkAnswerAndOperator`, which
asserts Comet executes the expression rather than falling back. Verified
passing on both Spark 3.4 (plain-node path) and Spark 3.5 (StaticInvoke path).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]