andygrove opened a new pull request, #4239: URL: https://github.com/apache/datafusion-comet/pull/4239
## Which issue does this PR close? Closes #. ## Rationale for this change This PR extends the JVM UDF framework (introduced in #4232) to support all Spark regular expression functions with full Java regex compatibility. Previously only `rlike` was implemented. **Note: This PR is stacked on #4232 (JVM UDF framework) and should be reviewed/merged after that PR.** ## What changes are included in this PR? - Add JVM UDF implementations for `regexp_extract`, `regexp_extract_all`, `regexp_instr`, `regexp_replace`, and `split` - Change default regexp engine from `rust` to `java` for full Spark compatibility (backreferences, lookaheads, embedded flags) - Add serde support to route these expressions through the JVM UDF bridge - Add Arrow schema normalization in the Rust JVM UDF executor (handles ListVector field naming differences between Arrow Java and DataFusion) - Reorganize SQL test files: separate files per engine (`*_rust.sql`, `*_rust_enabled.sql`, `*_java.sql`) ## How are these changes tested? - `CometRegExpJvmSuite`: 45 tests covering all regexp expressions - 12 SQL test files covering both Java and Rust engine paths - Existing `CometExpressionSuite` and `CometStringExpressionSuite` tests continue to pass -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
