neilconway opened a new pull request, #22534:
URL: https://github.com/apache/datafusion/pull/22534

   EliminateOuterJoin previously only matched the literal Filter -> Join 
pattern. When a Projection sits between the Filter and the Join, the rule 
no-ops and the outer join stays in place even when the predicate above the 
projection would justify converting it.
   
   A common shape that hits this comes from projection pruning after filter 
pushdown. In TPC-DS q49, PushDownFilter moves the returns-side predicate above 
the sales/returns LEFT JOIN, then OptimizeProjections inserts a pruning 
Projection between that Filter and the LEFT JOIN. The returns-side predicate 
still filters out the outer rows, but the projection hides the join from the 
old rule.
   
   Extend the rule to walk down through Projection nodes between Filter and 
Join, rewriting a working copy of the predicate into the join's coordinate 
space for analysis. The rewritten predicate is used only for analysis; the 
original predicate and surrounding plan structure are preserved on success.
   
   Tests cover passthrough projection, aliased projection, negative cases, a 
non-transparent Limit guard, and SQL-level q49-shaped cases where 
OptimizeProjections places a pruning Projection between a returns-side Filter 
and the sales/returns LEFT JOIN.
   
   ## Which issue does this PR close?
   
   - Closes #22531.
   
   ## Rationale for this change
   
   `EliminateOuterJoin` previously looked for plans with a `Filter` directly 
above a `Join`. For most queries, that is the right plan shape to look for 
(because `PushdownFilter` will typically place the filters that are useful for 
outer join elimination directly on top of the relevant `Join`). However, some 
plans don't follow this shape, for at least two reasons:
   
   1. Volatile expressions can interfere with filter pushdown
   2. `OptimizeProjections` might insert a `Projection` between the `Filter` 
and `Join`
   
   Notably, we run into case (2) in TPC-DS Q49; we currently fail to convert 
three outer joins to inner joins for that reason.
   
   We can handle this by teaching `EliminateOuterJoins` to descend through one 
or more intermediate `Projection` nodes, rewriting the filter predicate as it 
goes to account for the effect of the projection.
   
   ## What changes are included in this PR?
   
   * Teach `EliminateOuterJoins` to descend through one or more `Projection` 
nodes
   * Refactor various code in `eliminate_outer_joins.rs`, improve comments
   * Add unit tests
   * Add SLT tests
   
   ## Are these changes tested?
   
   Yes, new tests added. Manually verified that we fail to eliminate the outer 
joins in TPC-DS Q49 without this change and succeed on doing so with this 
change.
   
   ## Are there any user-facing changes?
   
   More effective outer join query optimization.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to