adriangb opened a new pull request, #20303: URL: https://github.com/apache/datafusion/pull/20303
## Which issue does this PR close? Closes https://github.com/apache/datafusion/issues/20213 ## Rationale for this change When both sides of a join have columns with the same name (e.g. `k`), the dynamic filter from an outer join was incorrectly pushed to **both** children instead of only the correct one. With small row groups this caused **wrong results** (0 rows instead of the expected result set). The root cause was that `FilterColumnChecker` in `filter_pushdown.rs` matched columns by **name only**. When the parent pushed a filter referencing column `k` at index 2 (the right child's `k`), the name-based check found `k` in the left child's schema too, and incorrectly pushed the filter to both sides. ## What changes are included in this PR? Approach adopted from #20192: 1. **`FilterColumnChecker`** now matches on `(name, index)` pairs instead of just names, preventing incorrect cross-side pushdown when columns share names 2. **`ChildFilterDescription::from_child_with_allowed_columns`** — new method that restricts pushdown to an explicit set of allowed `(name, index)` pairs 3. **`ChildFilterDescription::all_unsupported`** — helper to mark all filters unsupported for a child 4. **`HashJoinExec::gather_filters_for_pushdown`** — builds per-side allowed-column sets from `column_indices` (+ optional `projection`), uses `lr_is_preserved` to gate pushdown eligibility per join type 5. **`lr_is_preserved`** — mirrors the logical optimizer's preserved-side logic, enabling parent filter pushdown for non-inner join types (Left, Right, Semi, Anti, Mark) ## Are these changes tested? Yes: - **Unit test** for `lr_is_preserved` covering all join types - **SLT regression test** reproducing the exact issue #20213 scenario: subquery join with same-named columns, small row groups, verifying both `COUNT(*)` and `SELECT *` correctness - **Updated snapshot** for existing Left join filter pushdown test (preserved-side filter now correctly pushes down) - All existing hash join tests (368), filter pushdown tests (47), and SLT tests pass ## Are there any user-facing changes? - **Bug fix**: Queries with nested joins where both sides have same-named columns now return correct results with dynamic filter pushdown enabled - **Improvement**: Parent filters on preserved join sides can now push through non-inner joins (Left, Right, Semi, Anti, Mark) 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
