neilconway opened a new issue, #20384: URL: https://github.com/apache/datafusion/issues/20384
### Is your feature request related to a problem or challenge? When `array_has_any` is passed a scalar for either of its argument, we can use a much faster algorithm: rather than doing O(N*M) comparisons for each row of the columnar arg, we can build a hash table on the scalar array and probe it instead. ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered _No response_ ### Additional context #18181 discusses a user-reported query where `array_has_any` is slow. In that scenario, `array_has_any` is called on a table column and an uncorrelated subquery, which is currently passed to `array_has_any` as a par of columnar arguments (i.e., we don't take advantage of the fact that the subquery argument is effectively fixed). Optimizing that query involves two steps: 1. Optimize `array_has_any` for a scalar arg, which is this ticket. This has value as a standalone optimization. 2. Query optimization improvement to handle this general class of queries better; I'll do some more digging here and file another ticket shortly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
