andygrove opened a new pull request, #4233: URL: https://github.com/apache/datafusion-comet/pull/4233
## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/4193 Builds on https://github.com/apache/datafusion-comet/pull/4232 (JVM UDF framework) ## Rationale for this change This PR enables end users to provide their own `CometUDF` implementations that operate on Arrow columnar data, registered alongside standard Spark UDFs. When Comet encounters a matching UDF during planning, it routes to the vectorized Arrow implementation instead of falling back to Spark's row-at-a-time execution. ## What changes are included in this PR? - **`CometUdfRegistry`** — a thread-safe registry mapping Spark UDF names to CometUDF implementation class names + metadata. Includes a convenience method that registers both with Spark and Comet in one call. - **`CometScalaUdf` serde handler** — intercepts `ScalaUDF` expressions in query planning; if the UDF name is registered in `CometUdfRegistry`, emits a `JvmScalarUdf` proto for native execution. - **User guide page** (`custom-jvm-udfs.md`) — documents how to write, register, and deploy custom JVM UDFs. ### User-facing API: ```scala // Register the Spark UDF (row-at-a-time fallback) spark.udf.register("is_positive", (x: Int) => x > 0) // Register the CometUDF (vectorized Arrow implementation) CometUdfRegistry.register( "is_positive", "com.example.IsPositiveUdf", BooleanType, nullable = true ) ``` ## How are these changes tested? - JVM compilation verified (`mvn compile` passes for common + spark modules) - End-to-end testing will come in a follow-up PR with a concrete UDF example ## Test plan - [ ] Verify `CometUdfRegistry.register` + `ScalaUDF` interception emits `JvmScalarUdf` proto - [ ] Verify fallback to Spark when UDF is not registered - [ ] Verify user guide renders correctly 🤖 Generated with [Claude Code](https://claude.ai/code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
