[GitHub] [arrow-datafusion] isidentical commented on issue #3518: Improve performance of `regex_replace`

GitBox Fri, 07 Oct 2022 14:20:30 -0700


isidentical commented on issue #3518:
URL: 
https://github.com/apache/arrow-datafusion/issues/3518#issuecomment-1272095620


   > It might be that regexes themselves are so expensive, that the "null 
buffer" reuse has minimal benefit.
   
   Initial profiling indicates even with a very simple regex, for the query in 
that example, we spent around ~%25 of the whole execution time in `into_array` 
which is due to our usage of adapter even in the specialized mode.
   
   
https://cs.github.com/apache/arrow-datafusion/blob/15289610318e4acad48e40f5adabe0c5a9e8f9b9/datafusion/physical-expr/src/regex_expressions.rs#L307
   
   So perhaps we could extend the initial adapter with a system that can also 
receive hints (regarding whether the arrays needs to be padded or not). 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] isidentical commented on issue #3518: Improve performance of `regex_replace`

Reply via email to