Re: [I] The `limit` info lost in the AggregateExec when ser/deser the physical plan [datafusion]

via GitHub Thu, 23 May 2024 02:06:33 -0700


liukun4515 commented on issue #10630:
URL: https://github.com/apache/datafusion/issues/10630#issuecomment-2126605415


   > But other issue i found in the `AggregateExec`, when we push the limit to 
the agg exec and will select the
   > 
   > ```
   >         if let Some(limit) = self.limit {
   >             warn!("agg exec: {}", 
self.is_unordered_unfiltered_group_by_distinct());
   >             if !self.is_unordered_unfiltered_group_by_distinct() {
   >                 warn!("agg exec: create GroupedPriorityQueue");
   >                 return Ok(StreamType::GroupedPriorityQueue(
   >                     GroupedTopKAggregateStream::new(self, context, 
partition, limit)?,
   >                 ));
   >             }
   >         }
   > ```
   > 
   > `GroupedTopKAggregateStream`.
   > 
   > The implementation of `GroupedTopKAggregateStream` get the right result 
for the SQL, but the efficiency is not good, because we don't care about the 
order and don't need to consume all of downstream data
   
   In our sql:
   ```
   select
     LO_SUPPKEY
   from
     SSB_1G.LINEORDER
   GROUP BY
     LO_SUPPKEY
   limit 20  offset 10
   ```
   There is no sort/order and agg expression cause, we don't need to use the 
`GroupedTopKAggregateStream` struct to get the result. The 
`GroupedTopKAggregateStream` is not efficient for the SQL.
   
   The `GroupedTopKAggregateStream` will consume all of the data and use the 
`PriorityQueue` to store and sort all data
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] The `limit` info lost in the AggregateExec when ser/deser the physical plan [datafusion]

Reply via email to