gengliangwang opened a new pull request, #55664: URL: https://github.com/apache/spark/pull/55664
### What changes were proposed in this pull request? Document the predicate pushdown contract for CDC `Changelog` connectors in the `Changelog` Javadoc: - When any post-processing pass applies (carry-over removal, update detection, or netChanges), the connector's `SupportsPushDownFilters` / `SupportsPushDownV2Filters` implementation will only receive predicates that reference `_commit_version`, `_commit_timestamp`, or columns named by `rowId()`. - Predicates on `_change_type`, the `rowVersion()` column, or any data column are kept above the scan and never reach `pushFilters` / `pushPredicates`, because pushing them would drop a single half of a delete/insert pair within a row-identity group and silently break post-processing. - The restriction is enforced by the rewrite shape itself: a `Window` / `Aggregate` / `TransformWithState` keyed on the safe columns sits between the relation and the user's filter, so Catalyst's predicate-pushdown rules naturally block unsafe pushes. Connectors do not need to code this restriction themselves, but they must not bypass it (e.g. by self-applying filters from connector-specific options). This is a sub-task of SPARK-55668. ### Why are the changes needed? The contract was implicit. A connector author reading the Javadoc could reasonably implement `SupportsPushDownFilters` and accept all predicates, including unsafe ones, expecting Spark to handle the rest. Spelling out which predicates the connector actually needs to handle (and why others are intentionally never delivered) prevents accidental misuse and explains the asymmetry to anyone debugging an unexpected post-scan filter. ### Does this PR introduce _any_ user-facing change? Documentation only. No behavior change. ### How was this patch tested? `Xdoclint:html,syntax,accessibility` is clean on `Changelog.java`. No code changed; existing CDC test suites unaffected. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude opus-4-7 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
