andygrove opened a new pull request, #4233:
URL: https://github.com/apache/datafusion-comet/pull/4233

   ## Which issue does this PR close?
   
   Part of https://github.com/apache/datafusion-comet/issues/4193
   
   Builds on https://github.com/apache/datafusion-comet/pull/4232 (JVM UDF 
framework)
   
   ## Rationale for this change
   
   This PR enables end users to provide their own `CometUDF` implementations 
that operate on Arrow columnar data, registered alongside standard Spark UDFs. 
When Comet encounters a matching UDF during planning, it routes to the 
vectorized Arrow implementation instead of falling back to Spark's 
row-at-a-time execution.
   
   ## What changes are included in this PR?
   
   - **`CometUdfRegistry`** — a thread-safe registry mapping Spark UDF names to 
CometUDF implementation class names + metadata. Includes a convenience method 
that registers both with Spark and Comet in one call.
   - **`CometScalaUdf` serde handler** — intercepts `ScalaUDF` expressions in 
query planning; if the UDF name is registered in `CometUdfRegistry`, emits a 
`JvmScalarUdf` proto for native execution.
   - **User guide page** (`custom-jvm-udfs.md`) — documents how to write, 
register, and deploy custom JVM UDFs.
   
   ### User-facing API:
   
   ```scala
   // Register the Spark UDF (row-at-a-time fallback)
   spark.udf.register("is_positive", (x: Int) => x > 0)
   
   // Register the CometUDF (vectorized Arrow implementation)
   CometUdfRegistry.register(
     "is_positive",
     "com.example.IsPositiveUdf",
     BooleanType,
     nullable = true
   )
   ```
   
   ## How are these changes tested?
   
   - JVM compilation verified (`mvn compile` passes for common + spark modules)
   - End-to-end testing will come in a follow-up PR with a concrete UDF example
   
   ## Test plan
   
   - [ ] Verify `CometUdfRegistry.register` + `ScalaUDF` interception emits 
`JvmScalarUdf` proto
   - [ ] Verify fallback to Spark when UDF is not registered
   - [ ] Verify user guide renders correctly
   
   🤖 Generated with [Claude Code](https://claude.ai/code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to