[GitHub] [arrow-datafusion] avantgardnerio commented on issue #7198: Optimize SELECT min/max queries with limit

via GitHub Sat, 05 Aug 2023 05:02:15 -0700


avantgardnerio commented on issue #7198:
URL: 
https://github.com/apache/arrow-datafusion/issues/7198#issuecomment-1666485998


   > if we can get a general purpose optimization that also handles queries 
with different aggregates
   
   I don't think #7192 can handle your example. It works by "evicting" (nice 
term @tustvold ) groups from the accumulator unless they are the current 
min/max.
   
   So if we run your example:
   
   ```
   SELECT tag, field, max(time), min(other_field)
   FROM t
   GROUP BY tag
   ORDER BY max(time) DESC
   LIMIT 10
   ```
   
   on:
   ```
   +-----+-------+-------+-------------+
   | tag | time  | field | other_field |
   +-----+-------+-------+-------------+
   | 1   | 01:00 |       | 0           |
   | 2   | 02:00 |       | 1           |
   | 3   | 03:00 |       | 1           |
   | 4   | 04:00 |       | 1           |
   | 5   | 05:00 |       | 1           |
   | 6   | 06:00 |       | 1           |
   | 7   | 07:00 |       | 1           |
   | 8   | 08:00 |       | 1           |
   | 9   | 09:00 |       | 1           |
   | 10  | 10:00 |       | 1           |
   | 11  | 11:00 |       | 1           |
   | 1   | 12:00 |       | 999         |
   +-----+-------+-------+-------------+
   ```
   
   We will:
   1. accumulate 10 tag-groups
   2. run into tag-group 11 with a greater `time` that group-tag 1 (`11:00` vs 
`01:00`)
   3. evict tag-group 1 from our accumulator, along with it's `other_field=0` 
value
   4. run into tag-group 1 again with a greater `time` than tag-group 2 
(`12:00` vs `02:00`)
   5. evict tag-group 2, replace it with tag-group 1, and it's new "min" of 
`999`
   
   Hopefully this example makes it clear why we can only accumulate values 
present in the `order by` clause, given the approach in #7192 based on the 
functional requirements (not-sorting) of #7191 .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] avantgardnerio commented on issue #7198: Optimize SELECT min/max queries with limit

Reply via email to