alamb opened a new issue, #18181: URL: https://github.com/apache/datafusion/issues/18181
### Is your feature request related to a problem or challenge? While working on - https://github.com/apache/datafusion/issues/18070 @ianthetechie provided the following query which executes quite slowly: ```sql CREATE EXTERNAL TABLE categories_raw STORED AS PARQUET LOCATION 's3://fsq-os-places-us-east-1/release/dt=2025-09-09/categories/parquet/'; CREATE EXTERNAL TABLE places STORED AS PARQUET LOCATION 's3://fsq-os-places-us-east-1/release/dt=2025-09-09/places/parquet/'; WITH categories_arr AS ( SELECT array_agg(category_id) AS category_ids FROM categories_raw LIMIT 500 ) SELECT COUNT(*) FROM places p WHERE date_refreshed >= CURRENT_DATE - INTERVAL '365 days' AND array_has_any(p.fsq_category_ids, (SELECT category_ids FROM categories_arr)); ``` While the regression in https://github.com/apache/datafusion/issues/18070 was fixed, there is a lot of room to improve this query's performance still To reproduce, download [slow_array_has.zip](https://github.com/user-attachments/files/23004864/slow_array_has.zip) and run: ```shell datafusion-cli -f repro.sql ``` 60% of the overall query time is spent in `array_has` as can be seen by this quick profile <img width="1832" height="1364" alt="Image" src="https://github.com/user-attachments/assets/f5073c67-d8cf-40de-b563-b040f26072b4" /> ### Describe the solution you'd like Make `array_has` go faster ### Describe alternatives you've considered @jayzhan211 has some ideas here in - https://github.com/apache/datafusion/issues/12163 ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
