yyanyy opened a new pull request, #55252: URL: https://github.com/apache/spark/pull/55252
### What changes were proposed in this pull request? Add a `pushedFilters: Seq[Expression]` field to `DataSourceV2ScanRelation` that records which Catalyst filter expressions were fully pushed down to the data source and no longer appear as post-scan `Filter` nodes. The field is computed in `V2ScanRelationPushDown.pushDownFilters` as the set-difference of normalized input filters minus post-scan filters (using `ExpressionSet` for canonical comparison), and remapped through `projectionFunc` in `pruneColumns` so that attribute references (including nested struct types) stay consistent with the pruned scan output. Other scan-building paths (`buildScanWithPushedAggregate`, `buildScanWithPushedJoin`, `buildScanWithPushedVariants`) use the default empty value since their output schemas differ from table columns. The field is **not** yet wired into `validConstraints`. Doing so would change filter inference behavior (e.g., `InferFiltersFromConstraints` adding/removing filters, `PruneFilters` dropping redundant post-scan filters) and requires plan stability testing first. ### Why are the changes needed? Once a filter is pushed into a DSv2 scan, the logical plan loses track of it -- `DataSourceV2ScanRelation` contributes no constraints to the optimizer. This prevents constraint propagation (e.g., inferring filters across joins) and identification of redundant post-scan filters. This change stores the pushed filter information on the scan relation as groundwork for future constraint propagation. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added unit tests ### Was this patch authored or co-authored using generative AI tooling? claude code opus 4.6 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
