[ https://issues.apache.org/jira/browse/SPARK-30523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-30523: ---------------------------------- Affects Version/s: (was: 3.0.0) 3.1.0 > Collapse back to back aggregations into a single aggregate to reduce the > number of shuffles > ------------------------------------------------------------------------------------------- > > Key: SPARK-30523 > URL: https://issues.apache.org/jira/browse/SPARK-30523 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.1.0 > Reporter: Jason Altekruse > Priority: Major > > Queries containing nested aggregate operations can in some cases be > computable with a single phase of aggregation. This Jira seeks to introduce a > new optimizer rule to identify some of those cases and rewrite plans to avoid > needlessly re-shuffling and generating the aggregation hash table data twice. > Some examples of collapsible aggregates: > {code:java} > SELECT sum(sumAgg) as a, year from ( > select sum(1) as sumAgg, course, year FROM courseSales GROUP BY course, > year > ) group by year > // can be collapsed to > SELECT sum(1) as `a`, year from courseSales group by year > {code} > {code} > SELECT sum(agg), min(a), b from ( > select sum(1) as agg, a, b FROM testData2 GROUP BY a, b > ) group by b > // can be collapsed to > SELECT sum(1) as `sum(agg)`, min(a) as `min(a)`, b from testData2 group by b > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org