[GitHub] spark pull request #21473: [SPARK-21896][SQL] Fix StackOverflow caused by wi...

2018-06-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21473


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21473: [SPARK-21896][SQL] Fix StackOverflow caused by wi...

2018-05-31 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21473#discussion_r192292014
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1744,11 +1744,14 @@ class Analyzer(
*it into the plan tree.
*/
   object ExtractWindowExpressions extends Rule[LogicalPlan] {
-private def hasWindowFunction(projectList: Seq[NamedExpression]): 
Boolean =
-  projectList.exists(hasWindowFunction)
+private def hasWindowFunction(exprs: Seq[Expression]): Boolean =
+  exprs.exists(hasWindowFunction)
 
-private def hasWindowFunction(expr: NamedExpression): Boolean = {
+private def hasWindowFunction(expr: Expression): Boolean = {
   expr.find {
+case AggregateExpression(aggFunc, _, _, _) if 
hasWindowFunction(aggFunc.children) =>
--- End diff --

It's weird to throw exception inside a boolean function `hasXXX`.

Can we do this check in 
https://github.com/apache/spark/pull/21473/files#diff-57b3d87be744b7d79a9beacf8e5e5eb2R1838
 ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21473: [SPARK-21896][SQL] Fix StackOverflow caused by wi...

2018-05-31 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21473#discussion_r192284247
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala ---
@@ -687,4 +687,29 @@ class DataFrameAggregateSuite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  test("SPARK-21896: Window functions inside aggregate functions") {
+def checkWindowError(df: => DataFrame): Unit = {
+  val thrownException = the [AnalysisException] thrownBy {
+df.queryExecution.analyzed
+  }
+  assert(thrownException.message.contains("not allowed to use a window 
function"))
+}
+
+
checkWindowError(testData2.select(min(avg('b).over(Window.partitionBy('a)
+checkWindowError(testData2.agg(sum('b), 
max(rank().over(Window.orderBy('a)
+checkWindowError(testData2.groupBy('a).agg(sum('b), 
max(rank().over(Window.orderBy('b)
+
checkWindowError(testData2.groupBy('a).agg(max(sum(sum('b)).over(Window.orderBy('b)
+
+checkWindowError(
+  sql("SELECT MAX(RANK() OVER(ORDER BY b)) FROM testData2 GROUP BY a 
HAVING SUM(b) = 3"))
+checkWindowError(
+  sql("SELECT MAX(RANK() OVER(ORDER BY a)) FROM testData2"))
+checkWindowError(
+  sql("SELECT MAX(RANK() OVER(ORDER BY b)) FROM testData2 GROUP BY a"))
+checkAnswer(
+  sql("SELECT a, MAX(b), RANK() OVER(ORDER BY a) FROM testData2 GROUP 
BY a HAVING SUM(b) = 3"),
--- End diff --

 I think the dataset version should be 
```
df.groupBy('a).agg(max('b), sum('b).as("sumb"), 
rank().over(window)).where('sumb === 5)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21473: [SPARK-21896][SQL] Fix StackOverflow caused by wi...

2018-05-31 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/21473#discussion_r192234621
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1744,11 +1744,14 @@ class Analyzer(
*it into the plan tree.
*/
   object ExtractWindowExpressions extends Rule[LogicalPlan] {
-private def hasWindowFunction(projectList: Seq[NamedExpression]): 
Boolean =
-  projectList.exists(hasWindowFunction)
+private def hasWindowFunction(exprs: Seq[Expression]): Boolean =
+  exprs.exists(hasWindowFunction)
 
-private def hasWindowFunction(expr: NamedExpression): Boolean = {
+private def hasWindowFunction(expr: Expression): Boolean = {
   expr.find {
+case AggregateExpression(aggFunc, _, _, _) if 
hasWindowFunction(aggFunc.children) =>
--- End diff --

I have some doubts that this is the best place for this check. 
StackOverflow happens in ``extract``. We can also define a separate method and 
call it inside ``extract``. However, that method will share the same structure 
as ``hasWindowFunction``.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21473: [SPARK-21896][SQL] Fix StackOverflow caused by wi...

2018-05-31 Thread aokolnychyi
GitHub user aokolnychyi opened a pull request:

https://github.com/apache/spark/pull/21473

[SPARK-21896][SQL] Fix StackOverflow caused by window functions inside 
aggregate functions


## What changes were proposed in this pull request?

This PR explicitly prohibits window functions inside aggregates. Currently, 
this will cause StackOverflow during analysis. See PR #19193 for previous 
discussion.

## How was this patch tested?

This PR comes with a dedicated unit test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aokolnychyi/spark 
fix-stackoverflow-window-funcs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21473.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21473


commit 646a96d75c12b2e3c6886bc0cc9743e7ba838c8a
Author: aokolnychyi 
Date:   2018-05-31T10:58:29Z

[SPARK-21896][SQL] Fix StackOverflow caused by window functions inside 
aggregate functions




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org