[GitHub] spark pull request #14616: [SPARK-16955][SQL] Fix analysis error when using ...

2016-08-12 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/14616#discussion_r74555273
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/group-by-ordinal.sql.out ---
@@ -95,7 +95,7 @@ select a, b from data group by -1
 struct<>
 -- !query 8 output
 org.apache.spark.sql.AnalysisException
-GROUP BY position -1 is not in select list (valid range is [1, 2]); line 1 
pos 31
+GROUP BY position -1 is not in select list (valid range is [1, 2]); line 1 
pos 22
--- End diff --

why does the position change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14616: [SPARK-16955][SQL] Fix analysis error when using ...

2016-08-12 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14616#discussion_r74556830
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2223,3 +2223,29 @@ object TimeWindowing extends Rule[LogicalPlan] {
   }
   }
 }
+
+/**
+ * Replaces ordinal in 'order by' or 'group by' with unresolved 
UnresolvedOrdinal expression.
+ */
+class UnresolvedOrdinalSubstitution(conf: CatalystConf) extends 
Rule[LogicalPlan] {
+  private def isIntegerLiteral(sorter: Expression) = 
IntegerIndex.unapply(sorter).nonEmpty
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case s @ Sort(orders, _, _) if conf.orderByOrdinal &&
+  orders.exists(o => isIntegerLiteral(o.child)) =>
+  val newOrders = orders.map {
+case order @ SortOrder(IntegerIndex(index), _) =>
+  order.copy(child = UnresolvedOrdinal(index))
--- End diff --

we need a way to move the line position information.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14616: [SPARK-16955][SQL] Fix analysis error when using ...

2016-08-12 Thread clockfly
Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/14616#discussion_r74556035
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/group-by-ordinal.sql.out ---
@@ -95,7 +95,7 @@ select a, b from data group by -1
 struct<>
 -- !query 8 output
 org.apache.spark.sql.AnalysisException
-GROUP BY position -1 is not in select list (valid range is [1, 2]); line 1 
pos 31
+GROUP BY position -1 is not in select list (valid range is [1, 2]); line 1 
pos 22
--- End diff --

Sorry, I will fix this. I didn't understand the meaning of pos before.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14616: [SPARK-16955][SQL] Fix analysis error when using ...

2016-08-12 Thread clockfly
Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/14616#discussion_r7466
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2223,3 +2223,29 @@ object TimeWindowing extends Rule[LogicalPlan] {
   }
   }
 }
+
+/**
+ * Replaces ordinal in 'order by' or 'group by' with unresolved 
UnresolvedOrdinal expression.
+ */
+class UnresolvedOrdinalSubstitution(conf: CatalystConf) extends 
Rule[LogicalPlan] {
--- End diff --

Ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14616: [SPARK-16955][SQL] Fix analysis error when using ...

2016-08-12 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/14616#discussion_r74555215
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2223,3 +2223,29 @@ object TimeWindowing extends Rule[LogicalPlan] {
   }
   }
 }
+
+/**
+ * Replaces ordinal in 'order by' or 'group by' with unresolved 
UnresolvedOrdinal expression.
+ */
+class UnresolvedOrdinalSubstitution(conf: CatalystConf) extends 
Rule[LogicalPlan] {
--- End diff --

if we end up doing it this way, move this to its own file, and create an 
invididual test suite. 

analyzer file is getting too large.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14616: [SPARK-16955][SQL] Fix analysis error when using ...

2016-08-11 Thread clockfly
GitHub user clockfly opened a pull request:

https://github.com/apache/spark/pull/14616

[SPARK-16955][SQL] Fix analysis error when using ordinal in ORDER BY or 
GROUP BY

## What changes were proposed in this pull request?

This PR adds two unresolved expressions to represent the ordinal in GROUP 
BY or ORDER BY `GroupByOrdinal` and `OrderByOrdinal`, and fixes the rules when 
resolving ordinals.

Ordinals in GROUP BY or ORDER BY like `1` in `order by 1` or `group by 1` 
should be considered as unresolved expressions before analysis. But in current 
code, it is represented as a `Literal` expression directly, which is a resolved 
expression. It may cause analysis failure if a rule requires the ordinal to be 
resolved before applying.

**For example:**

Before this fix, rule `ResolveAggregateFunctions` will try to resolve the 
`Filter` before `Filter`'s child `Aggregate` is fully resolved (`Aggregate` 
contains an unresolved group by ordinal `2`) 

```
'Filter ('a > 0)
   +- Aggregate [2], [count(1) AS count(1)#83L, a#81]
+- SubqueryAlias tmp
+- Project [1 AS a#81]
 +- OneRowRelation$
```

### Before this change

Ordinal is stored as `Literal` expression

```
scala> sc.setLogLevel("TRACE")
scala> sql("select a from t group by 1 order by 1")
...
'Sort [1 ASC], true  
 +- 'Aggregate [1], ['a]
 +- 'UnresolvedRelation `t
```

And it causes analysis error when applying rule ResolveAggregateFunctions, 
as group by ordinal `2` claim to have been resolved, but is not resolved 
actually.

```
scala> Seq(1).toDF("a").createOrReplaceTempView("t")
scala> sql("select count(a), a from t group by 2 having a > 0").show
org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to 
Group by position: '2' exceeds the size of the select list '1'. on unresolved 
object, tree:
Aggregate [2], [(a#9 > 0) AS havingCondition#15]
+- SubqueryAlias t
   +- Project [value#7 AS a#9]
  +- LocalRelation [value#7]
...
```

### After this change

Ordinals are stored as `GroupByOrdinal` or `OrderByOrdinal`.

```
scala> sc.setLogLevel("TRACE")
scala> sql("select a from t group by 1 order by 1")
...
'Sort [orderbyordinal(1) ASC], true
 +- 'Aggregate [groupbyordinal(1)], ['a]
  +- 'UnresolvedRelation `t`
```

And rule ResolveAggregateFunctions can be safely applied as we have 
explicitly resolved `GroupByOrdinal(2)` before applying this rule. 

```
scala> Seq(1).toDF("a").createOrReplaceTempView("t")
scala> sql("select count(a), a from t group by 2 having a > 0").show
++---+  

|count(a)|  a|
++---+
|   1|  1|
++---+
```

## How was this patch tested?

Unit tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/clockfly/spark spark-16955

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14616.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14616


commit 40873650c7397a339210092f616c15aedbf13b17
Author: Sean Zhong 
Date:   2016-08-08T21:40:53Z

[SPARK-16955][SQL] Fix analysis error when using ordinal in ORDER BY or 
GROUP BY




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org