[GitHub] [spark] allisonwang-db commented on a change in pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans

2021-06-30 Thread GitBox


allisonwang-db commented on a change in pull request #32787:
URL: https://github.com/apache/spark/pull/32787#discussion_r662017168



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
##
@@ -807,4 +807,48 @@ class AnalysisErrorSuite extends AnalysisTest {
 // UnresolvedHint be removed by batch `Remove Unresolved Hints`
 assertAnalysisSuccess(plan, true)
   }
+
+  test("SPARK-35618: Resolve star expressions in subqueries") {
+val a = AttributeReference("a", IntegerType)()
+val b = AttributeReference("b", IntegerType)()
+val t0 = OneRowRelation()
+val t1 = LocalRelation(a, b).as("t1")
+
+// t1.* in the subquery should be resolved into outer(t1.a) and 
outer(t1.b).
+assertAnalysisError(
+  Project(ScalarSubquery(t0.select(star("t1"))).as("sub") :: Nil, t1),
+  "Scalar subquery must return only one column, but got 2" :: Nil)
+
+// array(t1.*) in the subquery should be resolved into array(outer(t1.a), 
outer(t1.b))
+val array = CreateArray(Seq(star("t1")))
+assertAnalysisError(
+  Project(ScalarSubquery(t0.select(array)).as("sub") :: Nil, t1),
+  "Expressions referencing the outer query are not supported" :: Nil)

Review comment:
   Yes star in `CreateArray` is valid. I truncated this error message but 
the complete one is `"Expressions referencing the outer query are not supported 
outside of WHERE/HAVING clauses"`. Will update the message to be clearer.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] allisonwang-db commented on a change in pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans

2021-06-30 Thread GitBox


allisonwang-db commented on a change in pull request #32787:
URL: https://github.com/apache/spark/pull/32787#discussion_r662013726



##
File path: sql/core/src/test/resources/sql-tests/results/join-lateral.sql.out
##
@@ -80,6 +80,46 @@ struct
 1  2   0   3
 
 
+-- !query
+SELECT * FROM t1, LATERAL (SELECT t1.*)
+-- !query schema
+struct
+-- !query output
+0  1   0   1
+1  2   1   2
+
+
+-- !query
+SELECT * FROM t1, LATERAL (SELECT t1.*, t2.* FROM t2)
+-- !query schema
+struct
+-- !query output
+0  1   0   1   0   2
+0  1   0   1   0   3
+1  2   1   2   0   2
+1  2   1   2   0   3
+
+
+-- !query
+SELECT * FROM t1, LATERAL (SELECT t1.* FROM t2 AS t1)
+-- !query schema
+struct
+-- !query output
+0  1   0   2
+0  1   0   3
+1  2   0   2
+1  2   0   3
+
+
+-- !query
+SELECT * FROM t1, LATERAL (SELECT t1.*, t2.* FROM t2, LATERAL (SELECT t1.*, 
t2.*, t3.* FROM t2 AS t3))

Review comment:
   Sounds good!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] allisonwang-db commented on a change in pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans

2021-06-29 Thread GitBox


allisonwang-db commented on a change in pull request #32787:
URL: https://github.com/apache/spark/pull/32787#discussion_r660149328



##
File path: sql/core/src/test/resources/sql-tests/results/join-lateral.sql.out
##
@@ -80,6 +80,46 @@ struct
 1  2   0   3
 
 
+-- !query
+SELECT * FROM t1, LATERAL (SELECT t1.*)
+-- !query schema
+struct
+-- !query output
+0  1   0   1
+1  2   1   2
+
+
+-- !query
+SELECT * FROM t1, LATERAL (SELECT t1.*, t2.* FROM t2)
+-- !query schema
+struct
+-- !query output
+0  1   0   1   0   2
+0  1   0   1   0   3
+1  2   1   2   0   2
+1  2   1   2   0   3
+
+
+-- !query
+SELECT * FROM t1, LATERAL (SELECT t1.* FROM t2 AS t1)
+-- !query schema
+struct
+-- !query output
+0  1   0   2
+0  1   0   3
+1  2   0   2
+1  2   0   3
+
+
+-- !query
+SELECT * FROM t1, LATERAL (SELECT t1.*, t2.* FROM t2, LATERAL (SELECT t1.*, 
t2.*, t3.* FROM t2 AS t3))

Review comment:
   Good question. Current Spark can only resolve the subquery using the 
immediate outer query. In this case t1.* can only be resolved using output from 
t2, which is c1, c2. There are many other places need to be updated to support 
nested subqueries with deep correlations.
   
https://github.com/apache/spark/blob/9c157a490bb59e02dcf44b14b411ea5beb68c238/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala#L164-L173




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] allisonwang-db commented on a change in pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans

2021-06-28 Thread GitBox


allisonwang-db commented on a change in pull request #32787:
URL: https://github.com/apache/spark/pull/32787#discussion_r660149328



##
File path: sql/core/src/test/resources/sql-tests/results/join-lateral.sql.out
##
@@ -80,6 +80,46 @@ struct
 1  2   0   3
 
 
+-- !query
+SELECT * FROM t1, LATERAL (SELECT t1.*)
+-- !query schema
+struct
+-- !query output
+0  1   0   1
+1  2   1   2
+
+
+-- !query
+SELECT * FROM t1, LATERAL (SELECT t1.*, t2.* FROM t2)
+-- !query schema
+struct
+-- !query output
+0  1   0   1   0   2
+0  1   0   1   0   3
+1  2   1   2   0   2
+1  2   1   2   0   3
+
+
+-- !query
+SELECT * FROM t1, LATERAL (SELECT t1.* FROM t2 AS t1)
+-- !query schema
+struct
+-- !query output
+0  1   0   2
+0  1   0   3
+1  2   0   2
+1  2   0   3
+
+
+-- !query
+SELECT * FROM t1, LATERAL (SELECT t1.*, t2.* FROM t2, LATERAL (SELECT t1.*, 
t2.*, t3.* FROM t2 AS t3))

Review comment:
   Good question. Current Spark can only resolve the subquery using the 
immediate outer query. In this case t1.* can only be resolved using output from 
t2, which is c1, c2. There are many other places need to be updated to support 
nested subqueries with deep correlations.
   
https://github.com/apache/spark/blob/9c157a490bb59e02dcf44b14b411ea5beb68c238/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala#L164-L173




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] allisonwang-db commented on a change in pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans

2021-06-25 Thread GitBox


allisonwang-db commented on a change in pull request #32787:
URL: https://github.com/apache/spark/pull/32787#discussion_r658953235



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1571,6 +1582,26 @@ class Analyzer(override val catalogManager: 
CatalogManager)
   }
 }
 
+// Expand the star expression using the input plan first. If failed, try 
resolve
+// the star expression using the outer query plan and wrap the resolved 
attributes
+// in outer references. Otherwise throw the original exception.
+private def expand(s: Star, plan: LogicalPlan): Seq[NamedExpression] = {
+  withPosition(s) {
+try {
+  s.expand(plan, resolver)
+} catch {
+  case e: AnalysisException =>
+AnalysisContext.get.outerPlan.map(p =>
+  // Only a few unary nodes (Project/Aggregate) can host star 
expressions.
+  Try(s.expand(p.children.head, resolver)) match {

Review comment:
   The error message is confusing when there are star usages other than 
Project and Filter. Will update CheckAnslysis.
   ```
   scala> sql("select * from t1 join t2 on t1.* = t2.c1")
   org.apache.spark.sql.AnalysisException: Invalid call to dataType on 
unresolved object;
   'Project [*]
   +- 'Join Inner, (ArrayBuffer(t1).* = c1#76)
  :- SubqueryAlias spark_catalog.default.t1
  :  +- View (`default`.`t1`, [c1#72,c2#73])
  : +- Project [cast(col1#74 as int) AS c1#72, cast(col2#75 as int) AS 
c2#73]
  :+- LocalRelation [col1#74, col2#75]
  +- SubqueryAlias spark_catalog.default.t2
 +- View (`default`.`t2`, [c1#76,c2#77])
+- Project [cast(col1#78 as int) AS c1#76, cast(col2#79 as int) AS 
c2#77]
   +- LocalRelation [col1#78, col2#79]
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] allisonwang-db commented on a change in pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans

2021-06-24 Thread GitBox


allisonwang-db commented on a change in pull request #32787:
URL: https://github.com/apache/spark/pull/32787#discussion_r658450441



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1571,6 +1582,26 @@ class Analyzer(override val catalogManager: 
CatalogManager)
   }
 }
 
+// Expand the star expression using the input plan first. If failed, try 
resolve
+// the star expression using the outer query plan and wrap the resolved 
attributes
+// in outer references. Otherwise throw the original exception.
+private def expand(s: Star, plan: LogicalPlan): Seq[NamedExpression] = {
+  withPosition(s) {
+try {
+  s.expand(plan, resolver)
+} catch {
+  case e: AnalysisException =>
+AnalysisContext.get.outerPlan.map(p =>
+  // Only a few unary nodes (Project/Aggregate) can host star 
expressions.
+  Try(s.expand(p.children.head, resolver)) match {

Review comment:
   Join can't host star expressions I think. Only star expressions in 
Project and Aggregate are handled in the Analyzer. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] allisonwang-db commented on a change in pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans

2021-06-15 Thread GitBox


allisonwang-db commented on a change in pull request #32787:
URL: https://github.com/apache/spark/pull/32787#discussion_r652363922



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveSubquerySuite.scala
##
@@ -177,4 +178,61 @@ class ResolveSubquerySuite extends AnalysisTest {
   condition = Some(sum('a) === sum('c)))
 assertAnalysisError(plan, Seq("Invalid expressions: [sum(a), sum(c)]"))
   }
+
+  test("SPARK-35618: lateral join with star expansion") {

Review comment:
   @maropu I looked into how regex expressions are resolved and the logic 
is actually different from star expressions. It won't throw exceptions when 
there is no match. Instead, it returns an empty sequence. So we can't tell if 
the regex expression is resolved by the current plan with an empty output, or 
it can't be resolved.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] allisonwang-db commented on a change in pull request #32787: [SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans

2021-06-07 Thread GitBox


allisonwang-db commented on a change in pull request #32787:
URL: https://github.com/apache/spark/pull/32787#discussion_r647164105



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
##
@@ -791,4 +791,28 @@ class AnalysisErrorSuite extends AnalysisTest {
   assertAnalysisError(plan, s"Correlated column is not allowed in 
predicate ($msg)" :: Nil)
 }
   }
+
+  test("SPARK-35618: Resolve star expressions in subquery") {

Review comment:
   Yes, currently only `Filter` can host outer references for correlated 
subqueries, and star expansion only happens when the node is either a `Project` 
or `Aggregate` (buildExpandedProjectList). It will be clearer with lateral 
subquery examples: 
   ```sql
   // t: [a, b]
   SELECT * FROM t, LATERAL (SELECT t.*)  // <--- t.* should be resolved as 
t.a, t.b
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org