[ https://issues.apache.org/jira/browse/SPARK-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15446615#comment-15446615 ]
Apache Spark commented on SPARK-17289: -------------------------------------- User 'maropu' has created a pull request for this issue: https://github.com/apache/spark/pull/14865 > Sort based partial aggregation breaks due to SPARK-12978 > -------------------------------------------------------- > > Key: SPARK-17289 > URL: https://issues.apache.org/jira/browse/SPARK-17289 > Project: Spark > Issue Type: Bug > Reporter: Sean Zhong > Priority: Blocker > > For the following query: > {code} > val df2 = (0 to 1000).map(x => (x % 2, x.toString)).toDF("a", > "b").createOrReplaceTempView("t2") > spark.sql("select max(b) from t2 group by a").explain(true) > {code} > Now, the SortAggregator won't insert Sort operator before partial > aggregation, this will break sort-based partial aggregation. > {code} > == Physical Plan == > SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) > +- *Sort [a#5 ASC], false, 0 > +- Exchange hashpartitioning(a#5, 200) > +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, > max#19]) > +- LocalTableScan [a#5, b#6] > {code} > In Spark 2.0 branch, the plan is: > {code} > == Physical Plan == > SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) > +- *Sort [a#5 ASC], false, 0 > +- Exchange hashpartitioning(a#5, 200) > +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, > max#19]) > +- *Sort [a#5 ASC], false, 0 > +- LocalTableScan [a#5, b#6] > {code} > This is related to SPARK-12978 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org