andygrove opened a new pull request, #4232: URL: https://github.com/apache/datafusion-comet/pull/4232
## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/4193 ## Rationale for this change This PR adds the core JVM UDF framework that enables Comet to invoke JVM-side UDF implementations operating on Arrow data via JNI. This allows us to quickly implement expressions with 100% Spark compatibility without re-implementing them in native Rust code — we call existing Java/Spark code, but operate on Arrow data, avoiding an expensive transition falling back to Spark. ## What changes are included in this PR? The framework consists of: **JVM side:** - `CometUDF` trait — interface that JVM UDF implementations must satisfy - `CometUdfBridge` — JNI entry point that native execution calls to invoke a UDF; handles class instantiation caching, Arrow FFI import/export, and result validation - `CometLambdaRegistry` — thread-safe registry bridging plan-time Spark expressions to execution-time UDF lookup **Native (Rust) side:** - `JvmScalarUdfExpr` — DataFusion `PhysicalExpr` that delegates evaluation to a JVM-side `CometUDF` via JNI and the Arrow C Data Interface - `CometUdfBridge` JNI handle in `jni-bridge` — caches class/method references - `JvmScalarUdf` protobuf message — serde format for transmitting UDF invocations from plan to execution **Planner integration:** - `ExprStruct::JvmScalarUdf` handling in the native planner This is the framework only — individual expression implementations (e.g., `array_exists`) will be added in follow-up PRs. ## How are these changes tested? - Rust compilation verified (`cargo check` passes for all affected crates) - End-to-end testing will come with the first expression implementation in a follow-up PR 🤖 Generated with [Claude Code](https://claude.ai/code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
