andygrove opened a new pull request, #4239:
URL: https://github.com/apache/datafusion-comet/pull/4239

   ## Which issue does this PR close?
   
   Closes #.
   
   ## Rationale for this change
   
   This PR extends the JVM UDF framework (introduced in #4232) to support all 
Spark regular expression functions with full Java regex compatibility. 
Previously only `rlike` was implemented.
   
   **Note: This PR is stacked on #4232 (JVM UDF framework) and should be 
reviewed/merged after that PR.**
   
   ## What changes are included in this PR?
   
   - Add JVM UDF implementations for `regexp_extract`, `regexp_extract_all`, 
`regexp_instr`, `regexp_replace`, and `split`
   - Change default regexp engine from `rust` to `java` for full Spark 
compatibility (backreferences, lookaheads, embedded flags)
   - Add serde support to route these expressions through the JVM UDF bridge
   - Add Arrow schema normalization in the Rust JVM UDF executor (handles 
ListVector field naming differences between Arrow Java and DataFusion)
   - Reorganize SQL test files: separate files per engine (`*_rust.sql`, 
`*_rust_enabled.sql`, `*_java.sql`)
   
   ## How are these changes tested?
   
   - `CometRegExpJvmSuite`: 45 tests covering all regexp expressions
   - 12 SQL test files covering both Java and Rust engine paths
   - Existing `CometExpressionSuite` and `CometStringExpressionSuite` tests 
continue to pass


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to