allisonwang-db opened a new pull request #32303: URL: https://github.com/apache/spark/pull/32303
### What changes were proposed in this pull request? This PR adds support for lateral subqueries. A lateral subquery is a subquery preceded by the `LATERAL` keyword in the FROM clause of a query that can reference columns in the preceding FROM items. For example: ```sql SELECT * FROM t1, LATERAL (SELECT * FROM t2 WHERE t1.a = t2.c) ``` A new join type: `LateralJoin` is introduced to represent a join with a lateral subquery. Currently `INNER`, `CROSS` and `LEFT OUTER` join types are supported with lateral join. Here is the analyzed plan for the above query: ```scala Project [a, b, c, d] :- Join LateralJoin(Inner) :- Relation t1 +- Project [c, d] +- Filter (outer(a) = c) +- Relation t2 ``` Similar to a correlated subquery, a lateral join can be viewed as a dependent (nested loop) join where the evaluation of the right subtree depends on the current value of the left subtree. The same technique to decorrelate a subquery is used to decorrelate a lateral join: ```scala Project [a, b, c, d] :- Join LateralJoin(Inner) (a = c) // pull up correlated predicates as join conditions :- Relation t1 +- Project [c1, c2] +- Relation t2 ``` Then a lateral join can be rewritten into a normal join: ```scala Project [a, b, c, d] :- Join Inner (a = c) :- Relation t1 +- Project [c1, c2] +- Relation t2 ``` #### What is not supported: - Correlation in the right subtree of a lateral join. This means a lateral subquery cannot contain another lateral or correlated subquery. Other limitations are the same as correlated subqueries as they use the same decorrelation framework. **Note:** similar to rewriting correlated scalar subqueries, rewriting lateral joins is also subject to the COUNT bug (See SPARK-15370 for more details). This is **not** handled in the current PR as it requires a sizeable amount of refactoring. It will be addressed in a subsequent PR. ### Why are the changes needed? To support an ANSI SQL feature. ### Does this PR introduce _any_ user-facing change? Yes. It allows users to use the `LATERAL` keyword in the FROM clause of a query. ### How was this patch tested? - Parser test: `PlanParserSuite.scala` - Analyzer test: `ResolveLateralJoinSuite.scala` - Optimizer test: `PullupCorrelatedPredicatesSuite.scala` - SQL test: `join-lateral.sql` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org