andygrove opened a new pull request, #4387: URL: https://github.com/apache/datafusion-comet/pull/4387
## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/4193 Supersedes #4233 (closed). ## Rationale for this change apache/main already ships the underlying JVM UDF framework: the `CometUDF` trait, the `JvmScalarUdf` proto, native dispatch via `CometUdfBridge`, and the Janino codegen dispatcher (#4267) for automatic `ScalaUDF` handling. What's missing is a way for end users to plug their own vectorized `CometUDF` implementation in directly, so they can hand-tune a columnar kernel for a specific function instead of going through codegen. ## What changes are included in this PR? - `CometUDFRegistry` (new): a thread-safe registry mapping a Spark UDF name to a user-supplied `CometUDF` implementation class. - `CometScalaUDF.convert` checks the registry first; if a registered name matches, it emits a `JvmScalarUdf` proto targeting the user class directly with the children expressions as args. Unregistered UDFs continue through the codegen dispatcher (when enabled). - `@org.apache.spark.annotation.Unstable` on `CometUDF` and `CometUDFRegistry` to signal that the user-facing surface may evolve. - New user-guide page `custom_comet_udfs.md` documenting the contract, registration, routing precedence, and cluster deployment. User-facing API: \`\`\`scala spark.udf.register(\"plus_one\", (x: Int) => x + 1) CometUDFRegistry.register(\"plus_one\", classOf[com.example.PlusOneUdf]) \`\`\` ## How are these changes tested? `CometRegisteredUdfSuite`: - Registered `CometUDF` runs on the native path end-to-end (\`checkSparkAnswerAndOperator\`). - An unregistered ScalaUDF falls back to Spark when codegen is disabled. - `register` / `isRegistered` / `unregister` round-trip. \`[skip ci]\` on this commit while iterating; will drop the tag once the design is settled. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
