yyanyy opened a new pull request, #55252:
URL: https://github.com/apache/spark/pull/55252

   ### What changes were proposed in this pull request?
   Add a `pushedFilters: Seq[Expression]` field to `DataSourceV2ScanRelation` 
that records which Catalyst filter expressions were fully pushed down to the 
data source and no longer appear as post-scan `Filter` nodes.
   
   The field is computed in `V2ScanRelationPushDown.pushDownFilters` as the 
set-difference of normalized input filters minus post-scan filters (using 
`ExpressionSet` for canonical comparison), and remapped through 
`projectionFunc` in `pruneColumns` so that attribute references (including 
nested struct types) stay consistent with the pruned scan output.
   
   Other scan-building paths (`buildScanWithPushedAggregate`, 
`buildScanWithPushedJoin`, `buildScanWithPushedVariants`) use the default empty 
value since their output schemas differ from table columns.
   
   The field is **not** yet wired into `validConstraints`. Doing so would 
change filter inference behavior (e.g., `InferFiltersFromConstraints` 
adding/removing filters, `PruneFilters` dropping redundant post-scan filters) 
and requires plan stability testing first.
   
   
   ### Why are the changes needed?
   Once a filter is pushed into a DSv2 scan, the logical plan loses track of it 
-- `DataSourceV2ScanRelation` contributes no constraints to the optimizer. This 
prevents constraint propagation (e.g., inferring filters across joins) and 
identification of redundant post-scan filters.
   
   This change stores the pushed filter information on the scan relation as 
groundwork for future constraint propagation.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Added unit tests
   
   ### Was this patch authored or co-authored using generative AI tooling?
   claude code opus 4.6


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to