[GitHub] spark pull request #21473: [SPARK-21896][SQL] Fix StackOverflow caused by wi...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21473 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21473: [SPARK-21896][SQL] Fix StackOverflow caused by wi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21473#discussion_r192292014 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1744,11 +1744,14 @@ class Analyzer( *it into the plan tree. */ object ExtractWindowExpressions extends Rule[LogicalPlan] { -private def hasWindowFunction(projectList: Seq[NamedExpression]): Boolean = - projectList.exists(hasWindowFunction) +private def hasWindowFunction(exprs: Seq[Expression]): Boolean = + exprs.exists(hasWindowFunction) -private def hasWindowFunction(expr: NamedExpression): Boolean = { +private def hasWindowFunction(expr: Expression): Boolean = { expr.find { +case AggregateExpression(aggFunc, _, _, _) if hasWindowFunction(aggFunc.children) => --- End diff -- It's weird to throw exception inside a boolean function `hasXXX`. Can we do this check in https://github.com/apache/spark/pull/21473/files#diff-57b3d87be744b7d79a9beacf8e5e5eb2R1838 ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21473: [SPARK-21896][SQL] Fix StackOverflow caused by wi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21473#discussion_r192284247 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala --- @@ -687,4 +687,29 @@ class DataFrameAggregateSuite extends QueryTest with SharedSQLContext { } } } + + test("SPARK-21896: Window functions inside aggregate functions") { +def checkWindowError(df: => DataFrame): Unit = { + val thrownException = the [AnalysisException] thrownBy { +df.queryExecution.analyzed + } + assert(thrownException.message.contains("not allowed to use a window function")) +} + + checkWindowError(testData2.select(min(avg('b).over(Window.partitionBy('a) +checkWindowError(testData2.agg(sum('b), max(rank().over(Window.orderBy('a) +checkWindowError(testData2.groupBy('a).agg(sum('b), max(rank().over(Window.orderBy('b) + checkWindowError(testData2.groupBy('a).agg(max(sum(sum('b)).over(Window.orderBy('b) + +checkWindowError( + sql("SELECT MAX(RANK() OVER(ORDER BY b)) FROM testData2 GROUP BY a HAVING SUM(b) = 3")) +checkWindowError( + sql("SELECT MAX(RANK() OVER(ORDER BY a)) FROM testData2")) +checkWindowError( + sql("SELECT MAX(RANK() OVER(ORDER BY b)) FROM testData2 GROUP BY a")) +checkAnswer( + sql("SELECT a, MAX(b), RANK() OVER(ORDER BY a) FROM testData2 GROUP BY a HAVING SUM(b) = 3"), --- End diff -- I think the dataset version should be ``` df.groupBy('a).agg(max('b), sum('b).as("sumb"), rank().over(window)).where('sumb === 5) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21473: [SPARK-21896][SQL] Fix StackOverflow caused by wi...
Github user aokolnychyi commented on a diff in the pull request: https://github.com/apache/spark/pull/21473#discussion_r192234621 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1744,11 +1744,14 @@ class Analyzer( *it into the plan tree. */ object ExtractWindowExpressions extends Rule[LogicalPlan] { -private def hasWindowFunction(projectList: Seq[NamedExpression]): Boolean = - projectList.exists(hasWindowFunction) +private def hasWindowFunction(exprs: Seq[Expression]): Boolean = + exprs.exists(hasWindowFunction) -private def hasWindowFunction(expr: NamedExpression): Boolean = { +private def hasWindowFunction(expr: Expression): Boolean = { expr.find { +case AggregateExpression(aggFunc, _, _, _) if hasWindowFunction(aggFunc.children) => --- End diff -- I have some doubts that this is the best place for this check. StackOverflow happens in ``extract``. We can also define a separate method and call it inside ``extract``. However, that method will share the same structure as ``hasWindowFunction``. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21473: [SPARK-21896][SQL] Fix StackOverflow caused by wi...
GitHub user aokolnychyi opened a pull request: https://github.com/apache/spark/pull/21473 [SPARK-21896][SQL] Fix StackOverflow caused by window functions inside aggregate functions ## What changes were proposed in this pull request? This PR explicitly prohibits window functions inside aggregates. Currently, this will cause StackOverflow during analysis. See PR #19193 for previous discussion. ## How was this patch tested? This PR comes with a dedicated unit test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/aokolnychyi/spark fix-stackoverflow-window-funcs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21473.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21473 commit 646a96d75c12b2e3c6886bc0cc9743e7ba838c8a Author: aokolnychyi Date: 2018-05-31T10:58:29Z [SPARK-21896][SQL] Fix StackOverflow caused by window functions inside aggregate functions --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org