[GitHub] spark issue #19301: [SPARK-22084][SQL] Fix performance regression in aggrega...

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19301 I believe this has been fixed, can we close it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #19301: [SPARK-22084][SQL] Fix performance regression in aggrega...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19301 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19301: [SPARK-22084][SQL] Fix performance regression in aggrega...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19301 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19301: [SPARK-22084][SQL] Fix performance regression in aggrega...

2017-12-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19301 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19301: [SPARK-22084][SQL] Fix performance regression in aggrega...

2017-09-22 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19301 @stanzhai Thanks. I see. Because the aggregation functions are bound to individual buffer slots, they are recognized as different expressions and won't be eliminated. ---

[GitHub] spark issue #19301: [SPARK-22084][SQL] Fix performance regression in aggrega...

2017-09-22 Thread stanzhai
Github user stanzhai commented on the issue: https://github.com/apache/spark/pull/19301 @viirya Benchmark code: ```scala val N = 500L << 22 val benchmark = new Benchmark("agg", N) val expressions = (0 until 50).map(i => s"sum(id) as r$i")

[GitHub] spark issue #19301: [SPARK-22084][SQL] Fix performance regression in aggrega...

2017-09-22 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19301 I asked it because by considering subexpressionElimination, you may not actually run it multiple times. So the benchmark numbers can tell if your fix really improves the performance. ---

[GitHub] spark issue #19301: [SPARK-22084][SQL] Fix performance regression in aggrega...

2017-09-22 Thread stanzhai
Github user stanzhai commented on the issue: https://github.com/apache/spark/pull/19301 @viirya The problem is already obvious, and the same aggregate expression will be computed multi times. I will provide a benchmark result later. ---

[GitHub] spark issue #19301: [SPARK-22084][SQL] Fix performance regression in aggrega...

2017-09-22 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19301 Regarding performance regression, I think you should post benchmark numbers. --- - To unsubscribe, e-mail:

[GitHub] spark issue #19301: [SPARK-22084][SQL] Fix performance regression in aggrega...

2017-09-21 Thread stanzhai
Github user stanzhai commented on the issue: https://github.com/apache/spark/pull/19301 @cenyuhai This is an optimize for physical plan, and your case can be optimized. ```SQL select dt, geohash_of_latlng, sum(mt_cnt), sum(ele_cnt), round(sum(mt_cnt) *

[GitHub] spark issue #19301: [SPARK-22084][SQL] Fix performance regression in aggrega...

2017-09-21 Thread cenyuhai
Github user cenyuhai commented on the issue: https://github.com/apache/spark/pull/19301 should `sum(mt_cnt)` and `sum(ele_cnt)` be compute again? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #19301: [SPARK-22084][SQL] Fix performance regression in aggrega...

2017-09-21 Thread cenyuhai
Github user cenyuhai commented on the issue: https://github.com/apache/spark/pull/19301 I don't know wether my case can be optimized or not. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #19301: [SPARK-22084][SQL] Fix performance regression in aggrega...

2017-09-21 Thread cenyuhai
Github user cenyuhai commented on the issue: https://github.com/apache/spark/pull/19301 my case: ```sql select dt, geohash_of_latlng, sum(mt_cnt), sum(ele_cnt), round(sum(mt_cnt) * 1.0 * 100 / sum(mt_cnt_all), 2), round(sum(ele_cnt) * 1.0 * 100 /

[GitHub] spark issue #19301: [SPARK-22084][SQL] Fix performance regression in aggrega...

2017-09-21 Thread stanzhai
Github user stanzhai commented on the issue: https://github.com/apache/spark/pull/19301 https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala#L211 ```scala val aggregateExpressions =

[GitHub] spark issue #19301: [SPARK-22084][SQL] Fix performance regression in aggrega...

2017-09-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19301 can you explain more about how this bug happens? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #19301: [SPARK-22084][SQL] Fix performance regression in aggrega...

2017-09-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19301 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional