[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18559 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18559#discussion_r126083465 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2638,4 +2638,17 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + test("SPARK-21335: support un-aliased subquery") { +withTempView("v") { + Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v") + checkAnswer(sql("SELECT i from (SELECT i FROM v)"), Row(1)) + + val e = intercept[AnalysisException](sql("SELECT v.i from (SELECT i FROM v)")) + assert(e.message == +"cannot resolve '`v.i`' given input columns: [_auto_generated_subquery_name.i]") --- End diff -- Yes. This is a wrong SQL query. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18559#discussion_r126083002 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -751,15 +751,17 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { * hooks. */ override def visitAliasedQuery(ctx: AliasedQueryContext): LogicalPlan = withOrigin(ctx) { -// The unaliased subqueries in the FROM clause are disallowed. Instead of rejecting it in -// parser rules, we handle it here in order to provide better error message. -if (ctx.strictIdentifier == null) { - throw new ParseException("The unaliased subqueries in the FROM clause are not supported.", -ctx) +val alias = if (ctx.strictIdentifier == null) { + // For un-aliased subqueries, use a default alias name that is not likely to conflict with + // normal subquery names, so that parent operators can only access the columns in subquery by + // unqualified names. Users can still use this special qualifier to access columns if they + // know it, but that's not recommended. + "_auto_generated_subquery_name" --- End diff -- How about using two underscores? `__spark_auto_generated_subquery_name`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18559#discussion_r126072754 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2638,4 +2638,17 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + test("SPARK-21335: support un-aliased subquery") { +withTempView("v") { + Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v") + checkAnswer(sql("SELECT i from (SELECT i FROM v)"), Row(1)) + + val e = intercept[AnalysisException](sql("SELECT v.i from (SELECT i FROM v)")) + assert(e.message == +"cannot resolve '`v.i`' given input columns: [_auto_generated_subquery_name.i]") --- End diff -- yea that seems wrong ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/18559#discussion_r126072760 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2638,4 +2638,17 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + test("SPARK-21335: support un-aliased subquery") { +withTempView("v") { + Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v") + checkAnswer(sql("SELECT i from (SELECT i FROM v)"), Row(1)) + + val e = intercept[AnalysisException](sql("SELECT v.i from (SELECT i FROM v)")) + assert(e.message == +"cannot resolve '`v.i`' given input columns: [_auto_generated_subquery_name.i]") --- End diff -- It's supported since 2.0.X, so definitely there are existing user queires and apps. I'm agreeing with this PR and want to understand the scope of changes. It looks good to me. ```scala scala> sc.version res0: String = 2.0.2 scala> Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v") scala> sql("SELECT v.i from (SELECT i FROM v)").show +---+ | i| +---+ | 1| +---+ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18559#discussion_r126072118 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2638,4 +2638,17 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + test("SPARK-21335: support un-aliased subquery") { +withTempView("v") { + Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v") + checkAnswer(sql("SELECT i from (SELECT i FROM v)"), Row(1)) + + val e = intercept[AnalysisException](sql("SELECT v.i from (SELECT i FROM v)")) + assert(e.message == +"cannot resolve '`v.i`' given input columns: [_auto_generated_subquery_name.i]") --- End diff -- we may have, but this is definitely wrong IMO. BTW at least we don't have this usage in our tests, so I think it's probably fine. also cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18559#discussion_r126071406 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2638,4 +2638,17 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + test("SPARK-21335: support un-aliased subquery") { +withTempView("v") { + Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v") + checkAnswer(sql("SELECT i from (SELECT i FROM v)"), Row(1)) + + val e = intercept[AnalysisException](sql("SELECT v.i from (SELECT i FROM v)")) + assert(e.message == +"cannot resolve '`v.i`' given input columns: [_auto_generated_subquery_name.i]") --- End diff -- Do we have such usage in existing queries? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/18559#discussion_r126071223 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2638,4 +2638,17 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + test("SPARK-21335: support un-aliased subquery") { +withTempView("v") { + Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v") + checkAnswer(sql("SELECT i from (SELECT i FROM v)"), Row(1)) + + val e = intercept[AnalysisException](sql("SELECT v.i from (SELECT i FROM v)")) + assert(e.message == +"cannot resolve '`v.i`' given input columns: [_auto_generated_subquery_name.i]") --- End diff -- Then, the scope of breaking change is reduced into this kind of queries? ```scala scala> sc.version res0: String = 2.1.1 scala> Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v") scala> sql("SELECT v.i from (SELECT i FROM v)").show +---+ | i| +---+ | 1| +---+ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18559#discussion_r126066595 --- Diff: sql/core/src/test/resources/sql-tests/results/string-functions.sql.out --- @@ -30,20 +30,20 @@ abc -- !query 3 EXPLAIN EXTENDED SELECT (col1 || col2 || col3 || col4) col -FROM (SELECT id col1, id col2, id col3, id col4 FROM range(10)) t +FROM (SELECT id col1, id col2, id col3, id col4 FROM range(10)) -- !query 3 schema struct -- !query 3 output == Parsed Logical Plan == 'Project [concat(concat(concat('col1, 'col2), 'col3), 'col4) AS col#x] -+- 'SubqueryAlias t ++- 'SubqueryAlias _auto_generated_subquery_name --- End diff -- I think it's ok, as the name is quite clear about it's auto-generated. And I think it's hard to hide it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18559#discussion_r126066489 --- Diff: sql/core/src/test/resources/sql-tests/results/string-functions.sql.out --- @@ -30,20 +30,20 @@ abc -- !query 3 EXPLAIN EXTENDED SELECT (col1 || col2 || col3 || col4) col -FROM (SELECT id col1, id col2, id col3, id col4 FROM range(10)) t +FROM (SELECT id col1, id col2, id col3, id col4 FROM range(10)) -- !query 3 schema struct -- !query 3 output == Parsed Logical Plan == 'Project [concat(concat(concat('col1, 'col2), 'col3), 'col4) AS col#x] -+- 'SubqueryAlias t ++- 'SubqueryAlias _auto_generated_subquery_name --- End diff -- Do we want to show the internal subquery name? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18559#discussion_r126066311 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -751,15 +751,17 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { * hooks. */ override def visitAliasedQuery(ctx: AliasedQueryContext): LogicalPlan = withOrigin(ctx) { -// The unaliased subqueries in the FROM clause are disallowed. Instead of rejecting it in -// parser rules, we handle it here in order to provide better error message. -if (ctx.strictIdentifier == null) { - throw new ParseException("The unaliased subqueries in the FROM clause are not supported.", -ctx) +val alias = if (ctx.strictIdentifier == null) { + // For un-aliased subqueries, ues a default alias name that is not likely to conflict with --- End diff -- nit: typo `ues`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/18559 [SPARK-21335][SQL] support un-aliased subquery ## What changes were proposed in this pull request? un-aliased subquery is supported by Spark SQL for a long time. Its semantic was not well defined and had confusing behaviors, and it's not a standard SQL syntax, so we disallowed it in https://issues.apache.org/jira/browse/SPARK-20690 . However, this is a breaking change, and we do have existing queries using un-aliased subquery. We should add the support back and fix its semantic. This PR fixes the un-aliased subquery by assigning a default alias name. ## How was this patch tested? new regression test You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark sub-query Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18559.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18559 commit 4d99c11802efa2d6ee5c36de5941226bf12e1a55 Author: Wenchen Fan Date: 2017-07-07T04:03:34Z support un-aliased subquery --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org