[jira] [Updated] (SPARK-17289) Sort based partial aggregation breaks due to SPARK-12978
[ https://issues.apache.org/jira/browse/SPARK-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-17289: --- Assignee: Takeshi Yamamuro > Sort based partial aggregation breaks due to SPARK-12978 > > > Key: SPARK-17289 > URL: https://issues.apache.org/jira/browse/SPARK-17289 > Project: Spark > Issue Type: Bug >Reporter: Sean Zhong >Assignee: Takeshi Yamamuro >Priority: Blocker > Fix For: 2.1.0 > > > For the following query: > {code} > val df2 = (0 to 1000).map(x => (x % 2, x.toString)).toDF("a", > "b").createOrReplaceTempView("t2") > spark.sql("select max(b) from t2 group by a").explain(true) > {code} > Now, the SortAggregator won't insert Sort operator before partial > aggregation, this will break sort-based partial aggregation. > {code} > == Physical Plan == > SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) > +- *Sort [a#5 ASC], false, 0 >+- Exchange hashpartitioning(a#5, 200) > +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, > max#19]) > +- LocalTableScan [a#5, b#6] > {code} > In Spark 2.0 branch, the plan is: > {code} > == Physical Plan == > SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) > +- *Sort [a#5 ASC], false, 0 >+- Exchange hashpartitioning(a#5, 200) > +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, > max#19]) > +- *Sort [a#5 ASC], false, 0 > +- LocalTableScan [a#5, b#6] > {code} > This is related to SPARK-12978 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17289) Sort based partial aggregation breaks due to SPARK-12978
[ https://issues.apache.org/jira/browse/SPARK-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell updated SPARK-17289: -- Priority: Blocker (was: Major) > Sort based partial aggregation breaks due to SPARK-12978 > > > Key: SPARK-17289 > URL: https://issues.apache.org/jira/browse/SPARK-17289 > Project: Spark > Issue Type: Bug >Reporter: Sean Zhong >Priority: Blocker > > For the following query: > {code} > val df2 = (0 to 1000).map(x => (x % 2, x.toString)).toDF("a", > "b").createOrReplaceTempView("t2") > spark.sql("select max(b) from t2 group by a").explain(true) > {code} > Now, the SortAggregator won't insert Sort operator before partial > aggregation, this will break sort-based partial aggregation. > {code} > == Physical Plan == > SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) > +- *Sort [a#5 ASC], false, 0 >+- Exchange hashpartitioning(a#5, 200) > +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, > max#19]) > +- LocalTableScan [a#5, b#6] > {code} > In Spark 2.0 branch, the plan is: > {code} > == Physical Plan == > SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) > +- *Sort [a#5 ASC], false, 0 >+- Exchange hashpartitioning(a#5, 200) > +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, > max#19]) > +- *Sort [a#5 ASC], false, 0 > +- LocalTableScan [a#5, b#6] > {code} > This is related to SPARK-12978 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17289) Sort based partial aggregation breaks due to SPARK-12978
[ https://issues.apache.org/jira/browse/SPARK-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Zhong updated SPARK-17289: --- Description: For the following query: {code} val df2 = (0 to 1000).map(x => (x % 2, x.toString)).toDF("a", "b").createOrReplaceTempView("t2") spark.sql("select max(b) from t2 group by a").explain(true) {code} Now, the SortAggregator won't insert Sort operator before partial aggregation, this will break sort-based partial aggregation. {code} == Physical Plan == SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) +- *Sort [a#5 ASC], false, 0 +- Exchange hashpartitioning(a#5, 200) +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, max#19]) +- LocalTableScan [a#5, b#6] {code} In Spark 2.0 branch, the plan is: {code} == Physical Plan == SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) +- *Sort [a#5 ASC], false, 0 +- Exchange hashpartitioning(a#5, 200) +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, max#19]) +- *Sort [a#5 ASC], false, 0 +- LocalTableScan [a#5, b#6] {code} This is related to SPARK-12978 was: For the following query: {code} val df2 = (0 to 1000).map(x => (x % 2, x.toString)).toDF("a", "b").createOrReplaceTempView("t2") spark.sql("select max(b) from t2 group by a").explain(true) {code} Now, the SortAggregator won't insert Sort operator before partial aggregation, this will break sort-based partial aggregation. {code} == Physical Plan == SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) +- *Sort [a#5 ASC], false, 0 +- Exchange hashpartitioning(a#5, 200) +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, max#19]) +- LocalTableScan [a#5, b#6] {code} In Spark 2.0 branch, the plan is: {code} == Physical Plan == SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) +- *Sort [a#5 ASC], false, 0 +- Exchange hashpartitioning(a#5, 200) +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, max#19]) +- *Sort [a#5 ASC], false, 0 +- LocalTableScan [a#5, b#6] {code} This is related with SPARK-12978 > Sort based partial aggregation breaks due to SPARK-12978 > > > Key: SPARK-17289 > URL: https://issues.apache.org/jira/browse/SPARK-17289 > Project: Spark > Issue Type: Bug >Reporter: Sean Zhong > > For the following query: > {code} > val df2 = (0 to 1000).map(x => (x % 2, x.toString)).toDF("a", > "b").createOrReplaceTempView("t2") > spark.sql("select max(b) from t2 group by a").explain(true) > {code} > Now, the SortAggregator won't insert Sort operator before partial > aggregation, this will break sort-based partial aggregation. > {code} > == Physical Plan == > SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) > +- *Sort [a#5 ASC], false, 0 >+- Exchange hashpartitioning(a#5, 200) > +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, > max#19]) > +- LocalTableScan [a#5, b#6] > {code} > In Spark 2.0 branch, the plan is: > {code} > == Physical Plan == > SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) > +- *Sort [a#5 ASC], false, 0 >+- Exchange hashpartitioning(a#5, 200) > +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, > max#19]) > +- *Sort [a#5 ASC], false, 0 > +- LocalTableScan [a#5, b#6] > {code} > This is related to SPARK-12978 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org