Vedin commented on PR #16933:
URL: https://github.com/apache/datafusion/pull/16933#issuecomment-3184394834

   Hi @haohuaijin, I accidentally worked on the same feature. The approach is 
the same. So, I don't want anyhow conflict with your contribution and will just 
wait until your PR is merged. I just want to highlight 2 cases that probably 
are not currently covered in your PR. 
   1. Aggregate functions with `QUALIFY` 
   ```sql
   CREATE TABLE qt (i INT, p VARCHAR, o INT) AS VALUES
     (1, 'A', 1),
     (2, 'A', 2),
     (3, 'B', 1),
     (4, 'B', 2);
   
   SELECT p, SUM(o) AS s
   FROM qt
   GROUP BY p
   QUALIFY RANK() OVER (ORDER BY s DESC) = 1
   ORDER BY p;
   ```
   2. Constant filter + QUALIFY
   ```sql
   CREATE TABLE web_base_events_this_run (
     domain_sessionid VARCHAR,
     app_id VARCHAR,
     page_view_id VARCHAR,
     derived_tstamp TIMESTAMP,
     dvce_created_tstamp TIMESTAMP,
     event_id VARCHAR
   ) AS SELECT * FROM VALUES
     ('ds1', 'appA', NULL, '2025-01-01 10:00:00'::timestamp, '2025-01-01 
10:05:00'::timestamp, 'e1'),
     ('ds1', 'appA', NULL, '2025-01-01 11:00:00'::timestamp, '2025-01-01 
11:00:00'::timestamp, 'e2'),
     ('ds1', 'appA', 'pv', '2025-01-01 12:00:00'::timestamp, '2025-01-01 
12:00:00'::timestamp, 'e3'),
     ('ds2', 'appB', NULL, '2025-01-01 09:00:00'::timestamp, '2025-01-01 
09:10:00'::timestamp, 'e4'),
     ('ds2', 'appB', NULL, '2025-01-01 09:05:00'::timestamp, '2025-01-01 
09:09:00'::timestamp, 'e5');
     
     SELECT domain_sessionid, app_id
   FROM web_base_events_this_run
   WHERE page_view_id IS NULL
   QUALIFY ROW_NUMBER() OVER (
     PARTITION BY domain_sessionid
     ORDER BY derived_tstamp, dvce_created_tstamp, event_id
   ) = 1
   ORDER BY domain_sessionid;
   ```
   I covered the first one by adding `qualify` expressions to 
`aggr_expr_haystack`:
   ```rust
           let aggr_expr_haystack = select_exprs
               .iter()
               .chain(having_expr_opt.iter())
               .chain(qualify_expr_opt_pre_aggr.iter());
   ```
   The second one required me to change the logic in 
`common_subexpr_eliminate.rs`. You can check it out here 
(https://github.com/Embucket/datafusion/pull/34/). Maybe you'll come up with a 
better solution. 
   Hope this helps.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to