isidentical opened a new issue, #3613: URL: https://github.com/apache/arrow-datafusion/issues/3613
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Currently `regex_replace` assumes that even the scalars for pattern/replacement are arrays (since they've been cast that way). This is making each call `regex_replace` pay a significant overhead since instead of iterating through only the variadic fields (e.g. the source string), we iterate through everything. This makes us pay the overhead for per-replacement pre-processing and more importantly the overhead of using hashmaps to cache regexes (unnecessary lookups). **Describe the solution you'd like** We can determine whether the replace arguments are scalars and if so we should be able to add a new case to the implementation where the pattern and replacement are scalar. Initially, we might want to only use this check for patterns, but this optimization can later be extended to replacement as well (the overhead of `replacement` is relatively small, but it is still none). **Describe alternatives you've considered** We can leave it as is, but this seems to be a common case and it is also something we fail particularly bad at clickhouse bench. **Additional context** Main issue on `regex_replace` by @Dandandan on #3518. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
