liukun4515 commented on issue #10630:
URL: https://github.com/apache/datafusion/issues/10630#issuecomment-2126605415
> But other issue i found in the `AggregateExec`, when we push the limit to
the agg exec and will select the
>
> ```
> if let Some(limit) = self.limit {
> warn!("agg exec: {}",
self.is_unordered_unfiltered_group_by_distinct());
> if !self.is_unordered_unfiltered_group_by_distinct() {
> warn!("agg exec: create GroupedPriorityQueue");
> return Ok(StreamType::GroupedPriorityQueue(
> GroupedTopKAggregateStream::new(self, context,
partition, limit)?,
> ));
> }
> }
> ```
>
> `GroupedTopKAggregateStream`.
>
> The implementation of `GroupedTopKAggregateStream` get the right result
for the SQL, but the efficiency is not good, because we don't care about the
order and don't need to consume all of downstream data
In our sql:
```
select
LO_SUPPKEY
from
SSB_1G.LINEORDER
GROUP BY
LO_SUPPKEY
limit 20 offset 10
```
There is no sort/order and agg expression cause, we don't need to use the
`GroupedTopKAggregateStream` struct to get the result. The
`GroupedTopKAggregateStream` is not efficient for the SQL.
The `GroupedTopKAggregateStream` will consume all of the data and use the
`PriorityQueue` to store and sort all data
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]