GitHub user clockfly opened a pull request: https://github.com/apache/spark/pull/14616
[SPARK-16955][SQL] Fix analysis error when using ordinal in ORDER BY or GROUP BY ## What changes were proposed in this pull request? This PR adds two unresolved expressions to represent the ordinal in GROUP BY or ORDER BY `GroupByOrdinal` and `OrderByOrdinal`, and fixes the rules when resolving ordinals. Ordinals in GROUP BY or ORDER BY like `1` in `order by 1` or `group by 1` should be considered as unresolved expressions before analysis. But in current code, it is represented as a `Literal` expression directly, which is a resolved expression. It may cause analysis failure if a rule requires the ordinal to be resolved before applying. **For example:** Before this fix, rule `ResolveAggregateFunctions` will try to resolve the `Filter` before `Filter`'s child `Aggregate` is fully resolved (`Aggregate` contains an unresolved group by ordinal `2`) ``` 'Filter ('a > 0) +- Aggregate [2], [count(1) AS count(1)#83L, a#81] +- SubqueryAlias tmp +- Project [1 AS a#81] +- OneRowRelation$ ``` ### Before this change Ordinal is stored as `Literal` expression ``` scala> sc.setLogLevel("TRACE") scala> sql("select a from t group by 1 order by 1") ... 'Sort [1 ASC], true +- 'Aggregate [1], ['a] +- 'UnresolvedRelation `t ``` And it causes analysis error when applying rule ResolveAggregateFunctions, as group by ordinal `2` claim to have been resolved, but is not resolved actually. ``` scala> Seq(1).toDF("a").createOrReplaceTempView("t") scala> sql("select count(a), a from t group by 2 having a > 0").show org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to Group by position: '2' exceeds the size of the select list '1'. on unresolved object, tree: Aggregate [2], [(a#9 > 0) AS havingCondition#15] +- SubqueryAlias t +- Project [value#7 AS a#9] +- LocalRelation [value#7] ... ``` ### After this change Ordinals are stored as `GroupByOrdinal` or `OrderByOrdinal`. ``` scala> sc.setLogLevel("TRACE") scala> sql("select a from t group by 1 order by 1") ... 'Sort [orderbyordinal(1) ASC], true +- 'Aggregate [groupbyordinal(1)], ['a] +- 'UnresolvedRelation `t` ``` And rule ResolveAggregateFunctions can be safely applied as we have explicitly resolved `GroupByOrdinal(2)` before applying this rule. ``` scala> Seq(1).toDF("a").createOrReplaceTempView("t") scala> sql("select count(a), a from t group by 2 having a > 0").show +--------+---+ |count(a)| a| +--------+---+ | 1| 1| +--------+---+ ``` ## How was this patch tested? Unit tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/clockfly/spark spark-16955 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14616.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14616 ---- commit 40873650c7397a339210092f616c15aedbf13b17 Author: Sean Zhong <seanzh...@databricks.com> Date: 2016-08-08T21:40:53Z [SPARK-16955][SQL] Fix analysis error when using ordinal in ORDER BY or GROUP BY ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org