[jira] [Commented] (SPARK-17289) Sort based partial aggregation breaks due to SPARK-12978
[ https://issues.apache.org/jira/browse/SPARK-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446615#comment-15446615 ] Apache Spark commented on SPARK-17289: -- User 'maropu' has created a pull request for this issue: https://github.com/apache/spark/pull/14865 > Sort based partial aggregation breaks due to SPARK-12978 > > > Key: SPARK-17289 > URL: https://issues.apache.org/jira/browse/SPARK-17289 > Project: Spark > Issue Type: Bug >Reporter: Sean Zhong >Priority: Blocker > > For the following query: > {code} > val df2 = (0 to 1000).map(x => (x % 2, x.toString)).toDF("a", > "b").createOrReplaceTempView("t2") > spark.sql("select max(b) from t2 group by a").explain(true) > {code} > Now, the SortAggregator won't insert Sort operator before partial > aggregation, this will break sort-based partial aggregation. > {code} > == Physical Plan == > SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) > +- *Sort [a#5 ASC], false, 0 >+- Exchange hashpartitioning(a#5, 200) > +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, > max#19]) > +- LocalTableScan [a#5, b#6] > {code} > In Spark 2.0 branch, the plan is: > {code} > == Physical Plan == > SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) > +- *Sort [a#5 ASC], false, 0 >+- Exchange hashpartitioning(a#5, 200) > +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, > max#19]) > +- *Sort [a#5 ASC], false, 0 > +- LocalTableScan [a#5, b#6] > {code} > This is related to SPARK-12978 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17289) Sort based partial aggregation breaks due to SPARK-12978
[ https://issues.apache.org/jira/browse/SPARK-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446344#comment-15446344 ] Takeshi Yamamuro commented on SPARK-17289: -- okay. I'll add tests and open pr. > Sort based partial aggregation breaks due to SPARK-12978 > > > Key: SPARK-17289 > URL: https://issues.apache.org/jira/browse/SPARK-17289 > Project: Spark > Issue Type: Bug >Reporter: Sean Zhong >Priority: Blocker > > For the following query: > {code} > val df2 = (0 to 1000).map(x => (x % 2, x.toString)).toDF("a", > "b").createOrReplaceTempView("t2") > spark.sql("select max(b) from t2 group by a").explain(true) > {code} > Now, the SortAggregator won't insert Sort operator before partial > aggregation, this will break sort-based partial aggregation. > {code} > == Physical Plan == > SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) > +- *Sort [a#5 ASC], false, 0 >+- Exchange hashpartitioning(a#5, 200) > +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, > max#19]) > +- LocalTableScan [a#5, b#6] > {code} > In Spark 2.0 branch, the plan is: > {code} > == Physical Plan == > SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) > +- *Sort [a#5 ASC], false, 0 >+- Exchange hashpartitioning(a#5, 200) > +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, > max#19]) > +- *Sort [a#5 ASC], false, 0 > +- LocalTableScan [a#5, b#6] > {code} > This is related to SPARK-12978 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17289) Sort based partial aggregation breaks due to SPARK-12978
[ https://issues.apache.org/jira/browse/SPARK-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446341#comment-15446341 ] Herman van Hovell commented on SPARK-17289: --- Looks good. Can you open a PR? > Sort based partial aggregation breaks due to SPARK-12978 > > > Key: SPARK-17289 > URL: https://issues.apache.org/jira/browse/SPARK-17289 > Project: Spark > Issue Type: Bug >Reporter: Sean Zhong >Priority: Blocker > > For the following query: > {code} > val df2 = (0 to 1000).map(x => (x % 2, x.toString)).toDF("a", > "b").createOrReplaceTempView("t2") > spark.sql("select max(b) from t2 group by a").explain(true) > {code} > Now, the SortAggregator won't insert Sort operator before partial > aggregation, this will break sort-based partial aggregation. > {code} > == Physical Plan == > SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) > +- *Sort [a#5 ASC], false, 0 >+- Exchange hashpartitioning(a#5, 200) > +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, > max#19]) > +- LocalTableScan [a#5, b#6] > {code} > In Spark 2.0 branch, the plan is: > {code} > == Physical Plan == > SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) > +- *Sort [a#5 ASC], false, 0 >+- Exchange hashpartitioning(a#5, 200) > +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, > max#19]) > +- *Sort [a#5 ASC], false, 0 > +- LocalTableScan [a#5, b#6] > {code} > This is related to SPARK-12978 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17289) Sort based partial aggregation breaks due to SPARK-12978
[ https://issues.apache.org/jira/browse/SPARK-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446323#comment-15446323 ] Takeshi Yamamuro commented on SPARK-17289: -- This is probably because EnsureRequirements does not check if partial aggregation satisfies sort requirements. We can fix this like; https://github.com/apache/spark/compare/master...maropu:SPARK-17289#diff-cdb577e36041e4a27a605b6b3063fd54L167 cc: [~hvanhovell] > Sort based partial aggregation breaks due to SPARK-12978 > > > Key: SPARK-17289 > URL: https://issues.apache.org/jira/browse/SPARK-17289 > Project: Spark > Issue Type: Bug >Reporter: Sean Zhong >Priority: Blocker > > For the following query: > {code} > val df2 = (0 to 1000).map(x => (x % 2, x.toString)).toDF("a", > "b").createOrReplaceTempView("t2") > spark.sql("select max(b) from t2 group by a").explain(true) > {code} > Now, the SortAggregator won't insert Sort operator before partial > aggregation, this will break sort-based partial aggregation. > {code} > == Physical Plan == > SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) > +- *Sort [a#5 ASC], false, 0 >+- Exchange hashpartitioning(a#5, 200) > +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, > max#19]) > +- LocalTableScan [a#5, b#6] > {code} > In Spark 2.0 branch, the plan is: > {code} > == Physical Plan == > SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) > +- *Sort [a#5 ASC], false, 0 >+- Exchange hashpartitioning(a#5, 200) > +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, > max#19]) > +- *Sort [a#5 ASC], false, 0 > +- LocalTableScan [a#5, b#6] > {code} > This is related to SPARK-12978 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17289) Sort based partial aggregation breaks due to SPARK-12978
[ https://issues.apache.org/jira/browse/SPARK-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446059#comment-15446059 ] Takeshi Yamamuro commented on SPARK-17289: -- yea, I'll check this. > Sort based partial aggregation breaks due to SPARK-12978 > > > Key: SPARK-17289 > URL: https://issues.apache.org/jira/browse/SPARK-17289 > Project: Spark > Issue Type: Bug >Reporter: Sean Zhong >Priority: Blocker > > For the following query: > {code} > val df2 = (0 to 1000).map(x => (x % 2, x.toString)).toDF("a", > "b").createOrReplaceTempView("t2") > spark.sql("select max(b) from t2 group by a").explain(true) > {code} > Now, the SortAggregator won't insert Sort operator before partial > aggregation, this will break sort-based partial aggregation. > {code} > == Physical Plan == > SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) > +- *Sort [a#5 ASC], false, 0 >+- Exchange hashpartitioning(a#5, 200) > +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, > max#19]) > +- LocalTableScan [a#5, b#6] > {code} > In Spark 2.0 branch, the plan is: > {code} > == Physical Plan == > SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) > +- *Sort [a#5 ASC], false, 0 >+- Exchange hashpartitioning(a#5, 200) > +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, > max#19]) > +- *Sort [a#5 ASC], false, 0 > +- LocalTableScan [a#5, b#6] > {code} > This is related to SPARK-12978 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17289) Sort based partial aggregation breaks due to SPARK-12978
[ https://issues.apache.org/jira/browse/SPARK-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446013#comment-15446013 ] Herman van Hovell commented on SPARK-17289: --- cc [~maropu] > Sort based partial aggregation breaks due to SPARK-12978 > > > Key: SPARK-17289 > URL: https://issues.apache.org/jira/browse/SPARK-17289 > Project: Spark > Issue Type: Bug >Reporter: Sean Zhong >Priority: Blocker > > For the following query: > {code} > val df2 = (0 to 1000).map(x => (x % 2, x.toString)).toDF("a", > "b").createOrReplaceTempView("t2") > spark.sql("select max(b) from t2 group by a").explain(true) > {code} > Now, the SortAggregator won't insert Sort operator before partial > aggregation, this will break sort-based partial aggregation. > {code} > == Physical Plan == > SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) > +- *Sort [a#5 ASC], false, 0 >+- Exchange hashpartitioning(a#5, 200) > +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, > max#19]) > +- LocalTableScan [a#5, b#6] > {code} > In Spark 2.0 branch, the plan is: > {code} > == Physical Plan == > SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) > +- *Sort [a#5 ASC], false, 0 >+- Exchange hashpartitioning(a#5, 200) > +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, > max#19]) > +- *Sort [a#5 ASC], false, 0 > +- LocalTableScan [a#5, b#6] > {code} > This is related to SPARK-12978 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17289) Sort based partial aggregation breaks due to SPARK-12978
[ https://issues.apache.org/jira/browse/SPARK-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446015#comment-15446015 ] Herman van Hovell commented on SPARK-17289: --- [~clockfly] Are you working on this one? > Sort based partial aggregation breaks due to SPARK-12978 > > > Key: SPARK-17289 > URL: https://issues.apache.org/jira/browse/SPARK-17289 > Project: Spark > Issue Type: Bug >Reporter: Sean Zhong >Priority: Blocker > > For the following query: > {code} > val df2 = (0 to 1000).map(x => (x % 2, x.toString)).toDF("a", > "b").createOrReplaceTempView("t2") > spark.sql("select max(b) from t2 group by a").explain(true) > {code} > Now, the SortAggregator won't insert Sort operator before partial > aggregation, this will break sort-based partial aggregation. > {code} > == Physical Plan == > SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) > +- *Sort [a#5 ASC], false, 0 >+- Exchange hashpartitioning(a#5, 200) > +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, > max#19]) > +- LocalTableScan [a#5, b#6] > {code} > In Spark 2.0 branch, the plan is: > {code} > == Physical Plan == > SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17]) > +- *Sort [a#5 ASC], false, 0 >+- Exchange hashpartitioning(a#5, 200) > +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, > max#19]) > +- *Sort [a#5 ASC], false, 0 > +- LocalTableScan [a#5, b#6] > {code} > This is related to SPARK-12978 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org