Brijesh-Thakkar opened a new pull request, #2988:
URL: https://github.com/apache/datafusion-comet/pull/2988

   Fixes #2977
   
   ## Rationale for this change
   
   Comet currently falls back to JVM-based implementations for string trimming 
functions, which leads to a significant performance regression compared to 
Spark (approximately 0.6–0.7x in benchmarks, as reported in #2977).
   
   This change introduces native Rust implementations for trim-related string 
expressions, eliminating JVM overhead and unnecessary allocations. The goal is 
to restore and exceed Spark baseline performance for these operations.
   
   ## What changes are included in this PR?
   
   - Add `trim.rs` containing native Rust implementations for:
     - `spark_trim`
     - `spark_ltrim`
     - `spark_rtrim`
     - `spark_btrim`
   - Use efficient Arrow array operations directly instead of JVM fallbacks
   - Introduce a fast-path optimization for strings without leading or trailing 
whitespace
   - Support both `Utf8` and `LargeUtf8` Arrow string array types
   - Add comprehensive unit tests covering all trim variants and edge cases
   
   The implementation avoids JVM execution paths and reduces allocations that 
previously caused the observed performance degradation.
   
   ## How are these changes tested?
   
   - Project builds successfully
   -  All unit tests pass
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to