[GitHub] [spark] allisonwang-db commented on a change in pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans
allisonwang-db commented on a change in pull request #32787: URL: https://github.com/apache/spark/pull/32787#discussion_r662017168 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala ## @@ -807,4 +807,48 @@ class AnalysisErrorSuite extends AnalysisTest { // UnresolvedHint be removed by batch `Remove Unresolved Hints` assertAnalysisSuccess(plan, true) } + + test("SPARK-35618: Resolve star expressions in subqueries") { +val a = AttributeReference("a", IntegerType)() +val b = AttributeReference("b", IntegerType)() +val t0 = OneRowRelation() +val t1 = LocalRelation(a, b).as("t1") + +// t1.* in the subquery should be resolved into outer(t1.a) and outer(t1.b). +assertAnalysisError( + Project(ScalarSubquery(t0.select(star("t1"))).as("sub") :: Nil, t1), + "Scalar subquery must return only one column, but got 2" :: Nil) + +// array(t1.*) in the subquery should be resolved into array(outer(t1.a), outer(t1.b)) +val array = CreateArray(Seq(star("t1"))) +assertAnalysisError( + Project(ScalarSubquery(t0.select(array)).as("sub") :: Nil, t1), + "Expressions referencing the outer query are not supported" :: Nil) Review comment: Yes star in `CreateArray` is valid. I truncated this error message but the complete one is `"Expressions referencing the outer query are not supported outside of WHERE/HAVING clauses"`. Will update the message to be clearer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] allisonwang-db commented on a change in pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans
allisonwang-db commented on a change in pull request #32787: URL: https://github.com/apache/spark/pull/32787#discussion_r662013726 ## File path: sql/core/src/test/resources/sql-tests/results/join-lateral.sql.out ## @@ -80,6 +80,46 @@ struct 1 2 0 3 +-- !query +SELECT * FROM t1, LATERAL (SELECT t1.*) +-- !query schema +struct +-- !query output +0 1 0 1 +1 2 1 2 + + +-- !query +SELECT * FROM t1, LATERAL (SELECT t1.*, t2.* FROM t2) +-- !query schema +struct +-- !query output +0 1 0 1 0 2 +0 1 0 1 0 3 +1 2 1 2 0 2 +1 2 1 2 0 3 + + +-- !query +SELECT * FROM t1, LATERAL (SELECT t1.* FROM t2 AS t1) +-- !query schema +struct +-- !query output +0 1 0 2 +0 1 0 3 +1 2 0 2 +1 2 0 3 + + +-- !query +SELECT * FROM t1, LATERAL (SELECT t1.*, t2.* FROM t2, LATERAL (SELECT t1.*, t2.*, t3.* FROM t2 AS t3)) Review comment: Sounds good! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] allisonwang-db commented on a change in pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans
allisonwang-db commented on a change in pull request #32787: URL: https://github.com/apache/spark/pull/32787#discussion_r660149328 ## File path: sql/core/src/test/resources/sql-tests/results/join-lateral.sql.out ## @@ -80,6 +80,46 @@ struct 1 2 0 3 +-- !query +SELECT * FROM t1, LATERAL (SELECT t1.*) +-- !query schema +struct +-- !query output +0 1 0 1 +1 2 1 2 + + +-- !query +SELECT * FROM t1, LATERAL (SELECT t1.*, t2.* FROM t2) +-- !query schema +struct +-- !query output +0 1 0 1 0 2 +0 1 0 1 0 3 +1 2 1 2 0 2 +1 2 1 2 0 3 + + +-- !query +SELECT * FROM t1, LATERAL (SELECT t1.* FROM t2 AS t1) +-- !query schema +struct +-- !query output +0 1 0 2 +0 1 0 3 +1 2 0 2 +1 2 0 3 + + +-- !query +SELECT * FROM t1, LATERAL (SELECT t1.*, t2.* FROM t2, LATERAL (SELECT t1.*, t2.*, t3.* FROM t2 AS t3)) Review comment: Good question. Current Spark can only resolve the subquery using the immediate outer query. In this case t1.* can only be resolved using output from t2, which is c1, c2. There are many other places need to be updated to support nested subqueries with deep correlations. https://github.com/apache/spark/blob/9c157a490bb59e02dcf44b14b411ea5beb68c238/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala#L164-L173 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] allisonwang-db commented on a change in pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans
allisonwang-db commented on a change in pull request #32787: URL: https://github.com/apache/spark/pull/32787#discussion_r660149328 ## File path: sql/core/src/test/resources/sql-tests/results/join-lateral.sql.out ## @@ -80,6 +80,46 @@ struct 1 2 0 3 +-- !query +SELECT * FROM t1, LATERAL (SELECT t1.*) +-- !query schema +struct +-- !query output +0 1 0 1 +1 2 1 2 + + +-- !query +SELECT * FROM t1, LATERAL (SELECT t1.*, t2.* FROM t2) +-- !query schema +struct +-- !query output +0 1 0 1 0 2 +0 1 0 1 0 3 +1 2 1 2 0 2 +1 2 1 2 0 3 + + +-- !query +SELECT * FROM t1, LATERAL (SELECT t1.* FROM t2 AS t1) +-- !query schema +struct +-- !query output +0 1 0 2 +0 1 0 3 +1 2 0 2 +1 2 0 3 + + +-- !query +SELECT * FROM t1, LATERAL (SELECT t1.*, t2.* FROM t2, LATERAL (SELECT t1.*, t2.*, t3.* FROM t2 AS t3)) Review comment: Good question. Current Spark can only resolve the subquery using the immediate outer query. In this case t1.* can only be resolved using output from t2, which is c1, c2. There are many other places need to be updated to support nested subqueries with deep correlations. https://github.com/apache/spark/blob/9c157a490bb59e02dcf44b14b411ea5beb68c238/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala#L164-L173 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] allisonwang-db commented on a change in pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans
allisonwang-db commented on a change in pull request #32787: URL: https://github.com/apache/spark/pull/32787#discussion_r658953235 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1571,6 +1582,26 @@ class Analyzer(override val catalogManager: CatalogManager) } } +// Expand the star expression using the input plan first. If failed, try resolve +// the star expression using the outer query plan and wrap the resolved attributes +// in outer references. Otherwise throw the original exception. +private def expand(s: Star, plan: LogicalPlan): Seq[NamedExpression] = { + withPosition(s) { +try { + s.expand(plan, resolver) +} catch { + case e: AnalysisException => +AnalysisContext.get.outerPlan.map(p => + // Only a few unary nodes (Project/Aggregate) can host star expressions. + Try(s.expand(p.children.head, resolver)) match { Review comment: The error message is confusing when there are star usages other than Project and Filter. Will update CheckAnslysis. ``` scala> sql("select * from t1 join t2 on t1.* = t2.c1") org.apache.spark.sql.AnalysisException: Invalid call to dataType on unresolved object; 'Project [*] +- 'Join Inner, (ArrayBuffer(t1).* = c1#76) :- SubqueryAlias spark_catalog.default.t1 : +- View (`default`.`t1`, [c1#72,c2#73]) : +- Project [cast(col1#74 as int) AS c1#72, cast(col2#75 as int) AS c2#73] :+- LocalRelation [col1#74, col2#75] +- SubqueryAlias spark_catalog.default.t2 +- View (`default`.`t2`, [c1#76,c2#77]) +- Project [cast(col1#78 as int) AS c1#76, cast(col2#79 as int) AS c2#77] +- LocalRelation [col1#78, col2#79] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] allisonwang-db commented on a change in pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans
allisonwang-db commented on a change in pull request #32787: URL: https://github.com/apache/spark/pull/32787#discussion_r658450441 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1571,6 +1582,26 @@ class Analyzer(override val catalogManager: CatalogManager) } } +// Expand the star expression using the input plan first. If failed, try resolve +// the star expression using the outer query plan and wrap the resolved attributes +// in outer references. Otherwise throw the original exception. +private def expand(s: Star, plan: LogicalPlan): Seq[NamedExpression] = { + withPosition(s) { +try { + s.expand(plan, resolver) +} catch { + case e: AnalysisException => +AnalysisContext.get.outerPlan.map(p => + // Only a few unary nodes (Project/Aggregate) can host star expressions. + Try(s.expand(p.children.head, resolver)) match { Review comment: Join can't host star expressions I think. Only star expressions in Project and Aggregate are handled in the Analyzer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] allisonwang-db commented on a change in pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans
allisonwang-db commented on a change in pull request #32787: URL: https://github.com/apache/spark/pull/32787#discussion_r652363922 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveSubquerySuite.scala ## @@ -177,4 +178,61 @@ class ResolveSubquerySuite extends AnalysisTest { condition = Some(sum('a) === sum('c))) assertAnalysisError(plan, Seq("Invalid expressions: [sum(a), sum(c)]")) } + + test("SPARK-35618: lateral join with star expansion") { Review comment: @maropu I looked into how regex expressions are resolved and the logic is actually different from star expressions. It won't throw exceptions when there is no match. Instead, it returns an empty sequence. So we can't tell if the regex expression is resolved by the current plan with an empty output, or it can't be resolved. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] allisonwang-db commented on a change in pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans
allisonwang-db commented on a change in pull request #32787: URL: https://github.com/apache/spark/pull/32787#discussion_r647164105 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala ## @@ -791,4 +791,28 @@ class AnalysisErrorSuite extends AnalysisTest { assertAnalysisError(plan, s"Correlated column is not allowed in predicate ($msg)" :: Nil) } } + + test("SPARK-35618: Resolve star expressions in subquery") { Review comment: Yes, currently only `Filter` can host outer references for correlated subqueries, and star expansion only happens when the node is either a `Project` or `Aggregate` (buildExpandedProjectList). It will be clearer with lateral subquery examples: ```sql // t: [a, b] SELECT * FROM t, LATERAL (SELECT t.*) // <--- t.* should be resolved as t.a, t.b ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org