[GitHub] spark pull request #21443: [SPARK-24369][SQL] Correct handling for multiple ...

2018-05-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21443


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21443: [SPARK-24369][SQL] Correct handling for multiple ...

2018-05-28 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21443#discussion_r191306683
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala
 ---
@@ -151,7 +152,7 @@ object RewriteDistinctAggregates extends 
Rule[LogicalPlan] {
   }
 
   // Setup unique distinct aggregate children.
-  val distinctAggChildren = 
distinctAggGroups.keySet.flatten.toSeq.distinct
+  val distinctAggChildren = distinctAggGroups.keySet.flatten.toSeq
--- End diff --

This is not related to this pr though, I dropped `.distinct` because it 
does nothing (`keySet.flatten` is already a set)?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21443: [SPARK-24369][SQL] Correct handling for multiple ...

2018-05-28 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21443#discussion_r191306423
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala ---
@@ -687,4 +687,12 @@ class DataFrameAggregateSuite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  test("SPARK-24369 multiple distinct aggregations having the same 
argument set") {
+val df = sql(
+  s"""SELECT corr(DISTINCT x, y), corr(DISTINCT y, x), count(*)
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21443: [SPARK-24369][SQL] Correct handling for multiple ...

2018-05-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21443#discussion_r191302814
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala
 ---
@@ -151,7 +152,7 @@ object RewriteDistinctAggregates extends 
Rule[LogicalPlan] {
   }
 
   // Setup unique distinct aggregate children.
-  val distinctAggChildren = 
distinctAggGroups.keySet.flatten.toSeq.distinct
+  val distinctAggChildren = distinctAggGroups.keySet.flatten.toSeq
--- End diff --

Is this needed?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21443: [SPARK-24369][SQL] Correct handling for multiple ...

2018-05-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21443#discussion_r191302155
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala ---
@@ -687,4 +687,12 @@ class DataFrameAggregateSuite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  test("SPARK-24369 multiple distinct aggregations having the same 
argument set") {
+val df = sql(
+  s"""SELECT corr(DISTINCT x, y), corr(DISTINCT y, x), count(*)
--- End diff --

Move it to SQLQueryTestSuite?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21443: [SPARK-24369][SQL] Correct handling for multiple ...

2018-05-28 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/spark/pull/21443

[SPARK-24369][SQL] Correct handling for multiple distinct aggregations 
having the same argument set

## What changes were proposed in this pull request?
This pr fixed an issue when having multiple distinct aggregations having 
the same argument set, e.g.,
```
scala>: paste
val df = sql(
  s"""SELECT corr(DISTINCT x, y), corr(DISTINCT y, x), count(*)
 | FROM (VALUES (1, 1), (2, 2), (2, 2)) t(x, y)
   """.stripMargin)

java.lang.RuntimeException
You hit a query analyzer bug. Please report your query to Spark user 
mailing list.
```
The root cause is that `RewriteDistinctAggregates` can't detect multiple 
distinct aggregations if they have the same argument set. This pr modified code 
so that `RewriteDistinctAggregates` could count the number of aggregate 
expressions with `isDistinct=true`.

## How was this patch tested?
Added tests in `DataFrameAggregateSuite`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/spark SPARK-24369

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21443.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21443


commit 00f6ad9547f462fd0cc3377cdd3aee44be19ffaf
Author: Takeshi Yamamuro 
Date:   2018-05-28T14:54:21Z

Fix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org