neilconway opened a new pull request, #21131:
URL: https://github.com/apache/datafusion/pull/21131

   ## Which issue does this PR close?
   
   - Closes #21129.
   
   ## Rationale for this change
   
   When the delimiter (and null string, if supplied) are scalars, we can 
implement `string_to_array` more efficiently. In particular, we can construct a 
`memmem::Finder` and use it to search for delimiters more efficiently.
   
   This PR implements this optimization; it also fixes a place where we were 
allocating an intermediate `String` for every character when the delimiter is 
`NULL`. (This isn't a common case but worth fixing.)
   
   Benchmarks (M4 Max):
   
   ```
     single_char_delim/5:    34.8 µs  (was  61.1 µs)  -43%
     single_char_delim/20:  145.1 µs  (was 220.7 µs)  -34%
     single_char_delim/100: 679.4 µs  (was   1.04 ms) -35%
   
     multi_char_delim/5:    41.7 µs  (was  56.7 µs)  -27%
     multi_char_delim/20:  158.9 µs  (was 185.1 µs)  -14%
     multi_char_delim/100: 731.4 µs  (was 858.3 µs)  -15%
   
     with_null_str/5:    43.1 µs  (was  68.7 µs)  -37%
     with_null_str/20:  179.3 µs  (was 244.3 µs)  -27%
     with_null_str/100: 895.8 µs  (was   1.16 ms) -23%
   
     null_delim/5:    17.4 µs  (was  64.1 µs)  -73%
     null_delim/20:   63.0 µs  (was 233.4 µs)  -73%
     null_delim/100: 280.2 µs  (was   1.12 ms) -75%
   
     columnar_delim/5:    65.2 µs  (was  60.2 µs)  +8%
     columnar_delim/20:  217.2 µs  (was 224.1 µs)  -3%
     columnar_delim/100:   1.02 ms  (was   1.05 ms) -3%
   ```
   
   ## What changes are included in this PR?
   
   * Add benchmark for `string_to_array`
   * Implement optimizations described above
   * Refactor columnar (fallback) path to get rid of a lot of type dispatch 
boilerplate
   * Improve SLT test coverage for the "columnar string, scalar other-args" case
   
   ## Are these changes tested?
   
   Yes.
   
   ## Are there any user-facing changes?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to