[GitHub] spark pull request #14616: [SPARK-16955][SQL] Fix analysis error when using ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14616#discussion_r74555273 --- Diff: sql/core/src/test/resources/sql-tests/results/group-by-ordinal.sql.out --- @@ -95,7 +95,7 @@ select a, b from data group by -1 struct<> -- !query 8 output org.apache.spark.sql.AnalysisException -GROUP BY position -1 is not in select list (valid range is [1, 2]); line 1 pos 31 +GROUP BY position -1 is not in select list (valid range is [1, 2]); line 1 pos 22 --- End diff -- why does the position change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14616: [SPARK-16955][SQL] Fix analysis error when using ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14616#discussion_r74556830 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2223,3 +2223,29 @@ object TimeWindowing extends Rule[LogicalPlan] { } } } + +/** + * Replaces ordinal in 'order by' or 'group by' with unresolved UnresolvedOrdinal expression. + */ +class UnresolvedOrdinalSubstitution(conf: CatalystConf) extends Rule[LogicalPlan] { + private def isIntegerLiteral(sorter: Expression) = IntegerIndex.unapply(sorter).nonEmpty + + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case s @ Sort(orders, _, _) if conf.orderByOrdinal && + orders.exists(o => isIntegerLiteral(o.child)) => + val newOrders = orders.map { +case order @ SortOrder(IntegerIndex(index), _) => + order.copy(child = UnresolvedOrdinal(index)) --- End diff -- we need a way to move the line position information. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14616: [SPARK-16955][SQL] Fix analysis error when using ...
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/14616#discussion_r74556035 --- Diff: sql/core/src/test/resources/sql-tests/results/group-by-ordinal.sql.out --- @@ -95,7 +95,7 @@ select a, b from data group by -1 struct<> -- !query 8 output org.apache.spark.sql.AnalysisException -GROUP BY position -1 is not in select list (valid range is [1, 2]); line 1 pos 31 +GROUP BY position -1 is not in select list (valid range is [1, 2]); line 1 pos 22 --- End diff -- Sorry, I will fix this. I didn't understand the meaning of pos before. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14616: [SPARK-16955][SQL] Fix analysis error when using ...
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/14616#discussion_r7466 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2223,3 +2223,29 @@ object TimeWindowing extends Rule[LogicalPlan] { } } } + +/** + * Replaces ordinal in 'order by' or 'group by' with unresolved UnresolvedOrdinal expression. + */ +class UnresolvedOrdinalSubstitution(conf: CatalystConf) extends Rule[LogicalPlan] { --- End diff -- Ok. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14616: [SPARK-16955][SQL] Fix analysis error when using ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14616#discussion_r74555215 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2223,3 +2223,29 @@ object TimeWindowing extends Rule[LogicalPlan] { } } } + +/** + * Replaces ordinal in 'order by' or 'group by' with unresolved UnresolvedOrdinal expression. + */ +class UnresolvedOrdinalSubstitution(conf: CatalystConf) extends Rule[LogicalPlan] { --- End diff -- if we end up doing it this way, move this to its own file, and create an invididual test suite. analyzer file is getting too large. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14616: [SPARK-16955][SQL] Fix analysis error when using ...
GitHub user clockfly opened a pull request: https://github.com/apache/spark/pull/14616 [SPARK-16955][SQL] Fix analysis error when using ordinal in ORDER BY or GROUP BY ## What changes were proposed in this pull request? This PR adds two unresolved expressions to represent the ordinal in GROUP BY or ORDER BY `GroupByOrdinal` and `OrderByOrdinal`, and fixes the rules when resolving ordinals. Ordinals in GROUP BY or ORDER BY like `1` in `order by 1` or `group by 1` should be considered as unresolved expressions before analysis. But in current code, it is represented as a `Literal` expression directly, which is a resolved expression. It may cause analysis failure if a rule requires the ordinal to be resolved before applying. **For example:** Before this fix, rule `ResolveAggregateFunctions` will try to resolve the `Filter` before `Filter`'s child `Aggregate` is fully resolved (`Aggregate` contains an unresolved group by ordinal `2`) ``` 'Filter ('a > 0) +- Aggregate [2], [count(1) AS count(1)#83L, a#81] +- SubqueryAlias tmp +- Project [1 AS a#81] +- OneRowRelation$ ``` ### Before this change Ordinal is stored as `Literal` expression ``` scala> sc.setLogLevel("TRACE") scala> sql("select a from t group by 1 order by 1") ... 'Sort [1 ASC], true +- 'Aggregate [1], ['a] +- 'UnresolvedRelation `t ``` And it causes analysis error when applying rule ResolveAggregateFunctions, as group by ordinal `2` claim to have been resolved, but is not resolved actually. ``` scala> Seq(1).toDF("a").createOrReplaceTempView("t") scala> sql("select count(a), a from t group by 2 having a > 0").show org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to Group by position: '2' exceeds the size of the select list '1'. on unresolved object, tree: Aggregate [2], [(a#9 > 0) AS havingCondition#15] +- SubqueryAlias t +- Project [value#7 AS a#9] +- LocalRelation [value#7] ... ``` ### After this change Ordinals are stored as `GroupByOrdinal` or `OrderByOrdinal`. ``` scala> sc.setLogLevel("TRACE") scala> sql("select a from t group by 1 order by 1") ... 'Sort [orderbyordinal(1) ASC], true +- 'Aggregate [groupbyordinal(1)], ['a] +- 'UnresolvedRelation `t` ``` And rule ResolveAggregateFunctions can be safely applied as we have explicitly resolved `GroupByOrdinal(2)` before applying this rule. ``` scala> Seq(1).toDF("a").createOrReplaceTempView("t") scala> sql("select count(a), a from t group by 2 having a > 0").show ++---+ |count(a)| a| ++---+ | 1| 1| ++---+ ``` ## How was this patch tested? Unit tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/clockfly/spark spark-16955 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14616.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14616 commit 40873650c7397a339210092f616c15aedbf13b17 Author: Sean Zhong Date: 2016-08-08T21:40:53Z [SPARK-16955][SQL] Fix analysis error when using ordinal in ORDER BY or GROUP BY --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org