adriangb opened a new pull request, #20303:
URL: https://github.com/apache/datafusion/pull/20303

   ## Which issue does this PR close?
   
   Closes https://github.com/apache/datafusion/issues/20213
   
   ## Rationale for this change
   
   When both sides of a join have columns with the same name (e.g. `k`), the 
dynamic filter from an outer join was incorrectly pushed to **both** children 
instead of only the correct one. With small row groups this caused **wrong 
results** (0 rows instead of the expected result set).
   
   The root cause was that `FilterColumnChecker` in `filter_pushdown.rs` 
matched columns by **name only**. When the parent pushed a filter referencing 
column `k` at index 2 (the right child's `k`), the name-based check found `k` 
in the left child's schema too, and incorrectly pushed the filter to both sides.
   
   ## What changes are included in this PR?
   
   Approach adopted from #20192:
   
   1. **`FilterColumnChecker`** now matches on `(name, index)` pairs instead of 
just names, preventing incorrect cross-side pushdown when columns share names
   2. **`ChildFilterDescription::from_child_with_allowed_columns`** — new 
method that restricts pushdown to an explicit set of allowed `(name, index)` 
pairs
   3. **`ChildFilterDescription::all_unsupported`** — helper to mark all 
filters unsupported for a child
   4. **`HashJoinExec::gather_filters_for_pushdown`** — builds per-side 
allowed-column sets from `column_indices` (+ optional `projection`), uses 
`lr_is_preserved` to gate pushdown eligibility per join type
   5. **`lr_is_preserved`** — mirrors the logical optimizer's preserved-side 
logic, enabling parent filter pushdown for non-inner join types (Left, Right, 
Semi, Anti, Mark)
   
   ## Are these changes tested?
   
   Yes:
   
   - **Unit test** for `lr_is_preserved` covering all join types
   - **SLT regression test** reproducing the exact issue #20213 scenario: 
subquery join with same-named columns, small row groups, verifying both 
`COUNT(*)` and `SELECT *` correctness
   - **Updated snapshot** for existing Left join filter pushdown test 
(preserved-side filter now correctly pushes down)
   - All existing hash join tests (368), filter pushdown tests (47), and SLT 
tests pass
   
   ## Are there any user-facing changes?
   
   - **Bug fix**: Queries with nested joins where both sides have same-named 
columns now return correct results with dynamic filter pushdown enabled
   - **Improvement**: Parent filters on preserved join sides can now push 
through non-inner joins (Left, Right, Semi, Anti, Mark)
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to