Vedin commented on PR #16933: URL: https://github.com/apache/datafusion/pull/16933#issuecomment-3184394834
Hi @haohuaijin, I accidentally worked on the same feature. The approach is the same. So, I don't want anyhow conflict with your contribution and will just wait until your PR is merged. I just want to highlight 2 cases that probably are not currently covered in your PR. 1. Aggregate functions with `QUALIFY` ```sql CREATE TABLE qt (i INT, p VARCHAR, o INT) AS VALUES (1, 'A', 1), (2, 'A', 2), (3, 'B', 1), (4, 'B', 2); SELECT p, SUM(o) AS s FROM qt GROUP BY p QUALIFY RANK() OVER (ORDER BY s DESC) = 1 ORDER BY p; ``` 2. Constant filter + QUALIFY ```sql CREATE TABLE web_base_events_this_run ( domain_sessionid VARCHAR, app_id VARCHAR, page_view_id VARCHAR, derived_tstamp TIMESTAMP, dvce_created_tstamp TIMESTAMP, event_id VARCHAR ) AS SELECT * FROM VALUES ('ds1', 'appA', NULL, '2025-01-01 10:00:00'::timestamp, '2025-01-01 10:05:00'::timestamp, 'e1'), ('ds1', 'appA', NULL, '2025-01-01 11:00:00'::timestamp, '2025-01-01 11:00:00'::timestamp, 'e2'), ('ds1', 'appA', 'pv', '2025-01-01 12:00:00'::timestamp, '2025-01-01 12:00:00'::timestamp, 'e3'), ('ds2', 'appB', NULL, '2025-01-01 09:00:00'::timestamp, '2025-01-01 09:10:00'::timestamp, 'e4'), ('ds2', 'appB', NULL, '2025-01-01 09:05:00'::timestamp, '2025-01-01 09:09:00'::timestamp, 'e5'); SELECT domain_sessionid, app_id FROM web_base_events_this_run WHERE page_view_id IS NULL QUALIFY ROW_NUMBER() OVER ( PARTITION BY domain_sessionid ORDER BY derived_tstamp, dvce_created_tstamp, event_id ) = 1 ORDER BY domain_sessionid; ``` I covered the first one by adding `qualify` expressions to `aggr_expr_haystack`: ```rust let aggr_expr_haystack = select_exprs .iter() .chain(having_expr_opt.iter()) .chain(qualify_expr_opt_pre_aggr.iter()); ``` The second one required me to change the logic in `common_subexpr_eliminate.rs`. You can check it out here (https://github.com/Embucket/datafusion/pull/34/). Maybe you'll come up with a better solution. Hope this helps. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org