andygrove opened a new pull request, #4387:
URL: https://github.com/apache/datafusion-comet/pull/4387

   ## Which issue does this PR close?
   
   Part of https://github.com/apache/datafusion-comet/issues/4193
   
   Supersedes #4233 (closed).
   
   ## Rationale for this change
   
   apache/main already ships the underlying JVM UDF framework: the `CometUDF` 
trait, the `JvmScalarUdf` proto, native dispatch via `CometUdfBridge`, and the 
Janino codegen dispatcher (#4267) for automatic `ScalaUDF` handling. What's 
missing is a way for end users to plug their own vectorized `CometUDF` 
implementation in directly, so they can hand-tune a columnar kernel for a 
specific function instead of going through codegen.
   
   ## What changes are included in this PR?
   
   - `CometUDFRegistry` (new): a thread-safe registry mapping a Spark UDF name 
to a user-supplied `CometUDF` implementation class.
   - `CometScalaUDF.convert` checks the registry first; if a registered name 
matches, it emits a `JvmScalarUdf` proto targeting the user class directly with 
the children expressions as args. Unregistered UDFs continue through the 
codegen dispatcher (when enabled).
   - `@org.apache.spark.annotation.Unstable` on `CometUDF` and 
`CometUDFRegistry` to signal that the user-facing surface may evolve.
   - New user-guide page `custom_comet_udfs.md` documenting the contract, 
registration, routing precedence, and cluster deployment.
   
   User-facing API:
   
   \`\`\`scala
   spark.udf.register(\"plus_one\", (x: Int) => x + 1)
   CometUDFRegistry.register(\"plus_one\", classOf[com.example.PlusOneUdf])
   \`\`\`
   
   ## How are these changes tested?
   
   `CometRegisteredUdfSuite`:
   - Registered `CometUDF` runs on the native path end-to-end 
(\`checkSparkAnswerAndOperator\`).
   - An unregistered ScalaUDF falls back to Spark when codegen is disabled.
   - `register` / `isRegistered` / `unregister` round-trip.
   
   \`[skip ci]\` on this commit while iterating; will drop the tag once the 
design is settled.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to