neilconway opened a new pull request, #21202: URL: https://github.com/apache/datafusion/pull/21202
## Which issue does this PR close? - Closes #10048. ## Rationale for this change Lateral joins are a commonly used SQL feature that allows the right-side join relation to access columns from the left-side of the join. Like correlated subqueries, two popular evaluation strategies are nested loops (re-evaluate the right-side of the join for each row of the left join input) and decorrelation (rewrite the right join input rewrite the right join input to remove the correlation, converting the lateral join into a standard join with the correlation predicates as join conditions). Decorrelation is typically much faster because the right side is evaluated once rather than re-executed for every row of the left input. Previously, DataFusion had some support for evaluating lateral joins via decorrelation, but it was not functional. This PR fixes and extends the existing code to make basic lateral joins functional, although several notable TODOs remain. This PR also adds a suite of SLT tests for lateral joins (derived from the DuckDB and Postgres tests), covering both implemented and to-be-implemented behavior. Notable TODOs: * LATERAL subqueries with HAVING clauses (#21198) * LEFT JOIN LATERAL (#21199) * LATERAL subqueries with outer relation references outside the WHERE clause (#21201) ## What changes are included in this PR? * Match query structure properly (unwrap `SubqueryAlias`) so that lateral subqueries are recognized properly, even if they have aliases * Handle nested LATERAL clauses; each LATERAL can only reference sibling outer relations * Properly handle "the count bug", following similar logic to what we do for this case with correlated subqueries * Remove a `todo!` panic in the physical planner if a `Subquery` node is seen; these just represent a subquery structure we aren't able to decorrelate yet * Properly raise an error and bail out for LATERAL subqueries with HAVING clauses * Add SLT test suite for lateral joins (~33 queries), based in part on DuckDB and Postgres test suites * Update expected EXPLAIN output in various places ## Are these changes tested? Yes; new tests added. I ran the test suite against DuckDB and confirmed that everything we expect to work produces the same results under DuckDB. ## Are there any user-facing changes? Yes; lateral joins now work for a wide swath of useful scenarios. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
