isidentical commented on issue #3518: URL: https://github.com/apache/arrow-datafusion/issues/3518#issuecomment-1272095620
> It might be that regexes themselves are so expensive, that the "null buffer" reuse has minimal benefit. Initial profiling indicates even with a very simple regex, for the query in that example, we spent around ~%25 of the whole execution time in `into_array` which is due to our usage of adapter even in the specialized mode. https://cs.github.com/apache/arrow-datafusion/blob/15289610318e4acad48e40f5adabe0c5a9e8f9b9/datafusion/physical-expr/src/regex_expressions.rs#L307 So perhaps we could extend the initial adapter with a system that can also receive hints (regarding whether the arrays needs to be padded or not). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
