neilconway opened a new pull request, #21202:
URL: https://github.com/apache/datafusion/pull/21202

   ## Which issue does this PR close?
   
   - Closes #10048.
   
   ## Rationale for this change
   
   Lateral joins are a commonly used SQL feature that allows the right-side 
join relation to access columns from the left-side of the join. Like correlated 
subqueries, two popular evaluation strategies are nested loops (re-evaluate the 
right-side of the join for each row of the left join input) and decorrelation 
(rewrite the right join input rewrite the right join input to remove the 
correlation, converting the lateral join into a standard join with the 
correlation predicates as join conditions). Decorrelation is typically much 
faster because the right side is evaluated once rather than re-executed for 
every row of the left input.
   
   Previously, DataFusion had some support for evaluating lateral joins via 
decorrelation, but it was not functional. This PR fixes and extends the 
existing code to make basic lateral joins functional, although several notable 
TODOs remain. This PR also adds a suite of SLT tests for lateral joins (derived 
from the DuckDB and Postgres tests), covering both implemented and 
to-be-implemented behavior.
   
   Notable TODOs:
   * LATERAL subqueries with HAVING clauses (#21198)
   * LEFT JOIN LATERAL (#21199)
   * LATERAL subqueries with outer relation references outside the WHERE clause 
(#21201)
   
   ## What changes are included in this PR?
   
   * Match query structure properly (unwrap `SubqueryAlias`) so that lateral 
subqueries are recognized properly, even if they have aliases
   * Handle nested LATERAL clauses; each LATERAL can only reference sibling 
outer relations
   * Properly handle "the count bug", following similar logic to what we do for 
this case with correlated subqueries
   * Remove a `todo!` panic in the physical planner if a `Subquery` node is 
seen; these just represent a subquery structure we aren't able to decorrelate 
yet
   * Properly raise an error and bail out for LATERAL subqueries with HAVING 
clauses
   * Add SLT test suite for lateral joins (~33 queries), based in part on 
DuckDB and Postgres test suites
   * Update expected EXPLAIN output in various places
   
   ## Are these changes tested?
   
   Yes; new tests added. I ran the test suite against DuckDB and confirmed that 
everything we expect to work produces the same results under DuckDB.
   
   ## Are there any user-facing changes?
   
   Yes; lateral joins now work for a wide swath of useful scenarios.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to