Jason Altekruse created SPARK-30523: ---------------------------------------
Summary: Collapse back to back aggregations into a single aggregate to reduce the number of shuffles Key: SPARK-30523 URL: https://issues.apache.org/jira/browse/SPARK-30523 Project: Spark Issue Type: Improvement Components: Optimizer Affects Versions: 3.0.0 Reporter: Jason Altekruse Queries containing nested aggregate operations can in some cases be computable with a single phase of aggregation. This Jira seeks to introduce a new optimizer rule to identify some of those cases and rewrite plans to avoid needlessly re-shuffling and generating the aggregation hash table data twice. Some examples of collapsible aggregates: {code:java} SELECT sum(sumAgg) as a, year from ( select sum(1) as sumAgg, course, year FROM courseSales GROUP BY course, year ) group by year // can be collapsed to SELECT sum(1) as `a`, year from courseSales group by year {code} {code} SELECT sum(agg), min(a), b from ( select sum(1) as agg, a, b FROM testData2 GROUP BY a, b ) group by b // can be collapsed to SELECT sum(1) as `sum(agg)`, min(a) as `min(a)`, b from testData2 group by b {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org