Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19301
I believe this has been fixed, can we close it?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19301
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19301
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19301
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19301
@stanzhai Thanks. I see. Because the aggregation functions are bound to
individual buffer slots, they are recognized as different expressions and won't
be eliminated.
---
Github user stanzhai commented on the issue:
https://github.com/apache/spark/pull/19301
@viirya
Benchmark code:
```scala
val N = 500L << 22
val benchmark = new Benchmark("agg", N)
val expressions = (0 until 50).map(i => s"sum(id) as r$i")
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19301
I asked it because by considering subexpressionElimination, you may not
actually run it multiple times. So the benchmark numbers can tell if your fix
really improves the performance.
---
Github user stanzhai commented on the issue:
https://github.com/apache/spark/pull/19301
@viirya The problem is already obvious, and the same aggregate expression
will be computed multi times. I will provide a benchmark result later.
---
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19301
Regarding performance regression, I think you should post benchmark numbers.
---
-
To unsubscribe, e-mail:
Github user stanzhai commented on the issue:
https://github.com/apache/spark/pull/19301
@cenyuhai This is an optimize for physical plan, and your case can be
optimized.
```SQL
select dt,
geohash_of_latlng,
sum(mt_cnt),
sum(ele_cnt),
round(sum(mt_cnt) *
Github user cenyuhai commented on the issue:
https://github.com/apache/spark/pull/19301
should `sum(mt_cnt)` and `sum(ele_cnt)` be compute again?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user cenyuhai commented on the issue:
https://github.com/apache/spark/pull/19301
I don't know wether my case can be optimized or not.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user cenyuhai commented on the issue:
https://github.com/apache/spark/pull/19301
my case:
```sql
select dt,
geohash_of_latlng,
sum(mt_cnt),
sum(ele_cnt),
round(sum(mt_cnt) * 1.0 * 100 / sum(mt_cnt_all), 2),
round(sum(ele_cnt) * 1.0 * 100 /
Github user stanzhai commented on the issue:
https://github.com/apache/spark/pull/19301
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala#L211
```scala
val aggregateExpressions =
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19301
can you explain more about how this bug happens?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19301
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
16 matches
Mail list logo