andygrove opened a new pull request, #4459:
URL: https://github.com/apache/datafusion-comet/pull/4459
## Which issue does this PR close?
Companion / alternative to #4283. Both PRs are drafts intended for
comparison, not for merging.
## Rationale for this change
This PR adds custom Rust scalar UDF support using only arrow-stable FFI
surfaces, as suggested in feedback on #4283 by @paleolimbot and @timsaucer. The
goal is to compare two ABI strategies side-by-side:
- #4283 — bespoke C ABI with custom type tags + IPC bytes +
`comet_udf_describe`
- this PR — arrow-only FFI, with two flavors offered side-by-side:
1. **C ABI (sedona-style)** — pure C-callable struct of function pointers,
parameterized only by Arrow's C Data Interface (`FFI_ArrowSchema` /
`FFI_ArrowArray`). No DataFusion types appear in the FFI surface, so user
libraries are decoupled from datafusion versions and the ABI is portable to
C/C++. Modeled on
[apache/sedona-db](https://github.com/apache/sedona-db/blob/6fe541cade2fcd7132feffb17091f83ac1411777/c/sedona-extension/src/sedona_extension.h#L114-L204)'s
`SedonaCScalarKernel`.
2. **datafusion-ffi (`FFI_ScalarUDF`)** — wraps the user's `ScalarUDFImpl`
as `FFI_ScalarUDF`. The user's library inherits the full `ScalarUDFImpl`
surface (variadic signatures, type coercion, metadata-aware return types) for
free, at the cost of a major-version pin against `datafusion-ffi`.
A single library may export either or both flavors; the host loader walks
both discovery functions. The test cdylib exposes `add_one_c` via the C ABI and
`add_one_df` via datafusion-ffi, and the end-to-end Spark suite drives both.
The scope is **intentionally minimal** vs #4283 — scalar-only, happy-path
e2e tests only — to keep the diff focused on the ABI strategy comparison. JVM
API surface mirrors #4283 exactly so the JVM side is apples-to-apples.
## What changes are included in this PR?
- **`native/comet-udf-sdk`** — public SDK with both ABI flavors behind cargo
features (`c-abi`, `df-abi`, both default-on). Provides `CometCScalarUdf` trait
+ `comet_c_udf_export!` macro for the C flavor; `comet_df_udf_export!` for
datafusion-ffi.
- **`native/comet-test-udfs`** — test cdylib exposing one UDF per ABI.
- **`native/core/src/execution/rust_udf`** — `loader` (libloading + ABI
version probe + dual-discovery), process-wide `cache`, `ImportedCScalarUdf`
adapter wrapping the C ABI as `ScalarUDFImpl`. The datafusion-ffi side needs no
adapter — `Arc<dyn ScalarUDFImpl>::from(&FFI_ScalarUDF)` is built-in.
- **`RustUdfCall` proto** in `expr.proto` (field 71) + planner branch in
`create_expr` resolving (library_path, name) against the cache and dispatching
through whichever ABI registered the kernel.
- **JNI bridge** (`CometRustUdfBridge` / `comet_rust_udf_bridge.rs`) for
driver-side `validateLibrary` / `listUdfs`.
- **Scala API** (`CometRustUDF.register`, `CometRustUdfRegistry`, typed
exception classes). The `QueryPlanSerde` path: `CometScalaUDF.convert` first
checks the registry — if the udfName is registered, it emits `RustUdfCall`;
otherwise it falls back to the existing JVM codegen dispatcher.
The Scala/Java files live under `spark/.../udf/` (rather than
`common/.../udf/` as in #4283) to avoid the `NativeBase`-relocation churn. This
is the only intentional structural divergence from #4283.
## How are these changes tested?
- **SDK unit tests** — 3 tests covering ABI version, C ABI adapter roundtrip
(export → import via FFI), and datafusion-ffi list build + round-trip via
`From<&FFI_ScalarUDF>`.
- **Native loader tests** — 5 tests covering library load, missing-path
error, and verifying both ABI flavors are discovered correctly.
- **End-to-end Spark suite** (`CometRustUdfSuite`) — 2 tests, one per ABI:
`add_one_c` (C ABI) and `add_one_df` (datafusion-ffi). Both pass, gated on
`-Dcomet.test.udfs.lib=<path>`.
```
$ DYLD_LIBRARY_PATH=$JAVA_HOME/lib/server cargo test -p datafusion-comet
--lib rust_udf
running 5 tests
test execution::rust_udf::loader::tests::missing_path_errors_open ... ok
test execution::rust_udf::loader::tests::load_test_udfs_succeeds ... ok
test execution::rust_udf::loader::tests::c_and_df_abi_have_expected_flavors
... ok
test execution::rust_udf::cache::tests::same_path_returns_same_arc ... ok
test execution::rust_udf::cache::tests::missing_path_propagates_error ... ok
test result: ok. 5 passed; 0 failed
$ ./mvnw -q -Pspark-3.5 -pl spark test
-DwildcardSuites=org.apache.comet.CometRustUdfSuite \
-Dtest=none -Denforcer.skip=true \
-Dcomet.test.udfs.lib=$PWD/native/target/debug/libcomet_test_udfs.dylib
- C ABI: add_one_c returns id + 1 for a range (6 seconds, 465 milliseconds)
- datafusion-ffi: add_one_df returns id + 1 for a range (69 milliseconds)
Run completed in 9 seconds, 43 milliseconds.
All tests passed.
```
## What this PR is *not*
- Not a full replacement for #4283 — half the test coverage, no user guide,
no struct-typed/registerAll/error/panic paths. The point is to make the ABI
surfaces comparable, not to ship a finished feature. Once a direction is
picked, the loser of the comparison can be deleted and the winner extended.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]