[GitHub] [arrow-datafusion] isidentical opened a new issue, #3613: Optimize regex_replace with a known pattern / replacement

GitBox Sun, 25 Sep 2022 08:39:01 -0700


isidentical opened a new issue, #3613:
URL: https://github.com/apache/arrow-datafusion/issues/3613


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   Currently `regex_replace` assumes that even the scalars for 
pattern/replacement are arrays (since they've been cast that way). This is 
making each call `regex_replace` pay a significant overhead since instead of 
iterating through only the variadic fields (e.g. the source string), we iterate 
through everything. This makes us pay the overhead for per-replacement 
pre-processing and more importantly the overhead of using hashmaps to cache 
regexes (unnecessary lookups).
   
   **Describe the solution you'd like**
   We can determine whether the replace arguments are scalars and if so we 
should be able to add a new case to the implementation where the pattern and 
replacement are scalar. Initially, we might want to only use this check for 
patterns, but this optimization can later be extended to replacement as well 
(the overhead of `replacement` is relatively small, but it is still none).
   
   **Describe alternatives you've considered**
   We can leave it as is, but this seems to be a common case and it is also 
something we fail particularly bad at clickhouse bench.
   
   **Additional context**
   Main issue on `regex_replace` by @Dandandan on #3518.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] isidentical opened a new issue, #3613: Optimize regex_replace with a known pattern / replacement

Reply via email to