allisonwang-db opened a new pull request #32303:
URL: https://github.com/apache/spark/pull/32303


   ### What changes were proposed in this pull request?
   This PR adds support for lateral subqueries. A lateral subquery is a 
subquery preceded by the `LATERAL` keyword in the FROM clause of a query that 
can reference columns in the preceding FROM items. For example:
   ```sql
   SELECT * FROM t1, LATERAL (SELECT * FROM t2 WHERE t1.a = t2.c)
   ```
   A new join type: `LateralJoin` is introduced to represent a join with a 
lateral subquery. Currently `INNER`, `CROSS` and `LEFT OUTER` join types are 
supported with lateral join. Here is the analyzed plan for the above query:
   ```scala
   Project [a, b, c, d]
   :- Join LateralJoin(Inner)
      :- Relation t1
      +- Project [c, d]
         +- Filter (outer(a) = c)
            +- Relation t2
   ```
   Similar to a correlated subquery, a lateral join can be viewed as a 
dependent (nested loop) join where the evaluation of the right subtree depends 
on the current value of the left subtree.  The same technique to decorrelate a 
subquery is used to decorrelate a lateral join:
   ```scala
   Project [a, b, c, d]
   :- Join LateralJoin(Inner) (a = c)  // pull up correlated predicates as join 
conditions
      :- Relation t1
      +- Project [c1, c2]
         +- Relation t2
   ```
   Then a lateral join can be rewritten into a normal join:
   ```scala
   Project [a, b, c, d]
   :- Join Inner (a = c)
      :- Relation t1
      +- Project [c1, c2]
         +- Relation t2
   ```
   #### What is not supported:
   - Correlation in the right subtree of a lateral join. This means a lateral 
subquery cannot contain another lateral or correlated subquery.
   
   Other limitations are the same as correlated subqueries as they use the same 
decorrelation framework.
   
   **Note:** similar to rewriting correlated scalar subqueries, rewriting 
lateral joins is also subject to the COUNT bug (See SPARK-15370 for more 
details). This is **not** handled in the current PR as it requires a sizeable 
amount of refactoring. It will be addressed in a subsequent PR.
   
   ### Why are the changes needed?
   To support an ANSI SQL feature.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. It allows users to use the `LATERAL` keyword in the FROM clause of a 
query.
   
   ### How was this patch tested?
   - Parser test: `PlanParserSuite.scala`
   - Analyzer test: `ResolveLateralJoinSuite.scala`
   - Optimizer test: `PullupCorrelatedPredicatesSuite.scala`
   - SQL test: `join-lateral.sql`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to