liukun4515 commented on issue #10630: URL: https://github.com/apache/datafusion/issues/10630#issuecomment-2133119074
I think pr https://github.com/apache/datafusion/pull/7192 introduced the top_k agg with the `priority queue` in the `AggregateExec` , and it is used to optimize the case like bellow pattern: ``` select column, sum(xx) from table group by column order by column ``` But in the pr https://github.com/apache/datafusion/pull/8038 introduced the new rule of `push limit for distinct column` which use the `is_unordered_unfiltered_group_by_distinct` to check the condition without the `sort` condition in the plan. This rule is used to optimize the case like: ``` select distinct column from table select column from table group by column ``` But the pr https://github.com/apache/datafusion/pull/8038 has no ability to reduce the output data of the `AggregateExec` in that cases, because the `GroupedHashAggregateStream` has no ability to handle the cases with `limit` output. I think we can implement this feature. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org