Sean Zhong created SPARK-17289: ---------------------------------- Summary: Sort based partial aggregation breaks due to SPARK-12978 Key: SPARK-17289 URL: https://issues.apache.org/jira/browse/SPARK-17289 Project: Spark Issue Type: Bug Reporter: Sean Zhong
For the following query: {code} val df2 = (0 to 1000).map(x => (x % 2, x.toString)).toDF("a", "b").createOrReplaceTempView("t2") spark.sql("select max(b) from t2 group by a").explain(true) {code} Now, the SortAggregator won't insert Sort operator before partial aggregation, this will break sort-based partial aggregation. {code} == Physical Plan == SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) +- *Sort [a#5 ASC], false, 0 +- Exchange hashpartitioning(a#5, 200) +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, max#19]) +- LocalTableScan [a#5, b#6] {code} In Spark 2.0 branch, the plan is: {code} == Physical Plan == SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) +- *Sort [a#5 ASC], false, 0 +- Exchange hashpartitioning(a#5, 200) +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, max#19]) +- *Sort [a#5 ASC], false, 0 +- LocalTableScan [a#5, b#6] {code} This is related with SPARK-12978 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org