[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery

2017-07-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18559


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery

2017-07-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18559#discussion_r126083465
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2638,4 +2638,17 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  test("SPARK-21335: support un-aliased subquery") {
+withTempView("v") {
+  Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v")
+  checkAnswer(sql("SELECT i from (SELECT i FROM v)"), Row(1))
+
+  val e = intercept[AnalysisException](sql("SELECT v.i from (SELECT i 
FROM v)"))
+  assert(e.message ==
+"cannot resolve '`v.i`' given input columns: 
[_auto_generated_subquery_name.i]")
--- End diff --

Yes. This is a wrong SQL query.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery

2017-07-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18559#discussion_r126083002
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -751,15 +751,17 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] 
with Logging {
* hooks.
*/
   override def visitAliasedQuery(ctx: AliasedQueryContext): LogicalPlan = 
withOrigin(ctx) {
-// The unaliased subqueries in the FROM clause are disallowed. Instead 
of rejecting it in
-// parser rules, we handle it here in order to provide better error 
message.
-if (ctx.strictIdentifier == null) {
-  throw new ParseException("The unaliased subqueries in the FROM 
clause are not supported.",
-ctx)
+val alias = if (ctx.strictIdentifier == null) {
+  // For un-aliased subqueries, use a default alias name that is not 
likely to conflict with
+  // normal subquery names, so that parent operators can only access 
the columns in subquery by
+  // unqualified names. Users can still use this special qualifier to 
access columns if they
+  // know it, but that's not recommended.
+  "_auto_generated_subquery_name"
--- End diff --

How about using two underscores? `__spark_auto_generated_subquery_name`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery

2017-07-06 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18559#discussion_r126072754
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2638,4 +2638,17 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  test("SPARK-21335: support un-aliased subquery") {
+withTempView("v") {
+  Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v")
+  checkAnswer(sql("SELECT i from (SELECT i FROM v)"), Row(1))
+
+  val e = intercept[AnalysisException](sql("SELECT v.i from (SELECT i 
FROM v)"))
+  assert(e.message ==
+"cannot resolve '`v.i`' given input columns: 
[_auto_generated_subquery_name.i]")
--- End diff --

yea that seems wrong ...



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery

2017-07-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/18559#discussion_r126072760
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2638,4 +2638,17 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  test("SPARK-21335: support un-aliased subquery") {
+withTempView("v") {
+  Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v")
+  checkAnswer(sql("SELECT i from (SELECT i FROM v)"), Row(1))
+
+  val e = intercept[AnalysisException](sql("SELECT v.i from (SELECT i 
FROM v)"))
+  assert(e.message ==
+"cannot resolve '`v.i`' given input columns: 
[_auto_generated_subquery_name.i]")
--- End diff --

It's supported since 2.0.X, so definitely there are existing user queires 
and apps. I'm agreeing with this PR and want to understand the scope of 
changes. It looks good to me.
```scala
scala> sc.version
res0: String = 2.0.2

scala> Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v")

scala> sql("SELECT v.i from (SELECT i FROM v)").show
+---+
|  i|
+---+
|  1|
+---+
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery

2017-07-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18559#discussion_r126072118
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2638,4 +2638,17 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  test("SPARK-21335: support un-aliased subquery") {
+withTempView("v") {
+  Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v")
+  checkAnswer(sql("SELECT i from (SELECT i FROM v)"), Row(1))
+
+  val e = intercept[AnalysisException](sql("SELECT v.i from (SELECT i 
FROM v)"))
+  assert(e.message ==
+"cannot resolve '`v.i`' given input columns: 
[_auto_generated_subquery_name.i]")
--- End diff --

we may have, but this is definitely wrong IMO. BTW at least we don't have 
this usage in our tests, so I think it's probably fine. also cc @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery

2017-07-06 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18559#discussion_r126071406
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2638,4 +2638,17 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  test("SPARK-21335: support un-aliased subquery") {
+withTempView("v") {
+  Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v")
+  checkAnswer(sql("SELECT i from (SELECT i FROM v)"), Row(1))
+
+  val e = intercept[AnalysisException](sql("SELECT v.i from (SELECT i 
FROM v)"))
+  assert(e.message ==
+"cannot resolve '`v.i`' given input columns: 
[_auto_generated_subquery_name.i]")
--- End diff --

Do we have such usage in existing queries?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery

2017-07-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/18559#discussion_r126071223
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2638,4 +2638,17 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  test("SPARK-21335: support un-aliased subquery") {
+withTempView("v") {
+  Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v")
+  checkAnswer(sql("SELECT i from (SELECT i FROM v)"), Row(1))
+
+  val e = intercept[AnalysisException](sql("SELECT v.i from (SELECT i 
FROM v)"))
+  assert(e.message ==
+"cannot resolve '`v.i`' given input columns: 
[_auto_generated_subquery_name.i]")
--- End diff --

Then, the scope of breaking change is reduced into this kind of queries?

```scala
scala> sc.version
res0: String = 2.1.1

scala> Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v")

scala> sql("SELECT v.i from (SELECT i FROM v)").show
+---+
|  i|
+---+
|  1|
+---+
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery

2017-07-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18559#discussion_r126066595
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/string-functions.sql.out ---
@@ -30,20 +30,20 @@ abc
 
 -- !query 3
 EXPLAIN EXTENDED SELECT (col1 || col2 || col3 || col4) col
-FROM (SELECT id col1, id col2, id col3, id col4 FROM range(10)) t
+FROM (SELECT id col1, id col2, id col3, id col4 FROM range(10))
 -- !query 3 schema
 struct
 -- !query 3 output
 == Parsed Logical Plan ==
 'Project [concat(concat(concat('col1, 'col2), 'col3), 'col4) AS col#x]
-+- 'SubqueryAlias t
++- 'SubqueryAlias _auto_generated_subquery_name
--- End diff --

I think it's ok, as the name is quite clear about it's auto-generated. And 
I think it's hard to hide it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery

2017-07-06 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18559#discussion_r126066489
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/string-functions.sql.out ---
@@ -30,20 +30,20 @@ abc
 
 -- !query 3
 EXPLAIN EXTENDED SELECT (col1 || col2 || col3 || col4) col
-FROM (SELECT id col1, id col2, id col3, id col4 FROM range(10)) t
+FROM (SELECT id col1, id col2, id col3, id col4 FROM range(10))
 -- !query 3 schema
 struct
 -- !query 3 output
 == Parsed Logical Plan ==
 'Project [concat(concat(concat('col1, 'col2), 'col3), 'col4) AS col#x]
-+- 'SubqueryAlias t
++- 'SubqueryAlias _auto_generated_subquery_name
--- End diff --

Do we want to show the internal subquery name?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery

2017-07-06 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18559#discussion_r126066311
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -751,15 +751,17 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] 
with Logging {
* hooks.
*/
   override def visitAliasedQuery(ctx: AliasedQueryContext): LogicalPlan = 
withOrigin(ctx) {
-// The unaliased subqueries in the FROM clause are disallowed. Instead 
of rejecting it in
-// parser rules, we handle it here in order to provide better error 
message.
-if (ctx.strictIdentifier == null) {
-  throw new ParseException("The unaliased subqueries in the FROM 
clause are not supported.",
-ctx)
+val alias = if (ctx.strictIdentifier == null) {
+  // For un-aliased subqueries, ues a default alias name that is not 
likely to conflict with
--- End diff --

nit: typo `ues`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery

2017-07-06 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/18559

[SPARK-21335][SQL] support un-aliased subquery

## What changes were proposed in this pull request?

un-aliased subquery is supported by Spark SQL for a long time. Its semantic 
was not well defined and had confusing behaviors, and it's not a standard SQL 
syntax, so we disallowed it in 
https://issues.apache.org/jira/browse/SPARK-20690 .

However, this is a breaking change, and we do have existing queries using 
un-aliased subquery. We should add the support back and fix its semantic.

This PR fixes the un-aliased subquery by assigning a default alias name.

## How was this patch tested?

new regression test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark sub-query

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18559.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18559


commit 4d99c11802efa2d6ee5c36de5941226bf12e1a55
Author: Wenchen Fan 
Date:   2017-07-07T04:03:34Z

support un-aliased subquery




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org