neilconway opened a new pull request, #21131:
URL: https://github.com/apache/datafusion/pull/21131
## Which issue does this PR close?
- Closes #21129.
## Rationale for this change
When the delimiter (and null string, if supplied) are scalars, we can
implement `string_to_array` more efficiently. In particular, we can construct a
`memmem::Finder` and use it to search for delimiters more efficiently.
This PR implements this optimization; it also fixes a place where we were
allocating an intermediate `String` for every character when the delimiter is
`NULL`. (This isn't a common case but worth fixing.)
Benchmarks (M4 Max):
```
single_char_delim/5: 34.8 µs (was 61.1 µs) -43%
single_char_delim/20: 145.1 µs (was 220.7 µs) -34%
single_char_delim/100: 679.4 µs (was 1.04 ms) -35%
multi_char_delim/5: 41.7 µs (was 56.7 µs) -27%
multi_char_delim/20: 158.9 µs (was 185.1 µs) -14%
multi_char_delim/100: 731.4 µs (was 858.3 µs) -15%
with_null_str/5: 43.1 µs (was 68.7 µs) -37%
with_null_str/20: 179.3 µs (was 244.3 µs) -27%
with_null_str/100: 895.8 µs (was 1.16 ms) -23%
null_delim/5: 17.4 µs (was 64.1 µs) -73%
null_delim/20: 63.0 µs (was 233.4 µs) -73%
null_delim/100: 280.2 µs (was 1.12 ms) -75%
columnar_delim/5: 65.2 µs (was 60.2 µs) +8%
columnar_delim/20: 217.2 µs (was 224.1 µs) -3%
columnar_delim/100: 1.02 ms (was 1.05 ms) -3%
```
## What changes are included in this PR?
* Add benchmark for `string_to_array`
* Implement optimizations described above
* Refactor columnar (fallback) path to get rid of a lot of type dispatch
boilerplate
* Improve SLT test coverage for the "columnar string, scalar other-args" case
## Are these changes tested?
Yes.
## Are there any user-facing changes?
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]