viirya commented on code in PR #55664:
URL: https://github.com/apache/spark/pull/55664#discussion_r3179737761


##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/Changelog.java:
##########
@@ -75,6 +75,23 @@
  * row identities only touched in the latest observed commit are held back 
until either a
  * later commit (with strictly greater `_commit_timestamp`) advances the 
global watermark
  * past them, or the source terminates.
+ * <p>
+ * <b>Pushdown contract.</b> When any post-processing pass applies (carry-over
+ * removal, update detection, or netChanges), Spark only pushes predicates
+ * that reference {@code _commit_version}, {@code _commit_timestamp}, or
+ * columns named by {@link #rowId()} to the connector's
+ * {@link org.apache.spark.sql.connector.read.SupportsPushDownFilters} /
+ * {@link org.apache.spark.sql.connector.read.SupportsPushDownV2Filters}.
+ * Predicates on {@code _change_type}, the {@link #rowVersion()} column, or
+ * data columns are kept above the scan: pushing them would drop one half of
+ * a delete/insert pair within a row-identity group and silently break
+ * post-processing. Catalyst's pushdown rules enforce this via the rewrite
+ * operators, so connectors do not need to code the restriction themselves --
+ * but must not bypass it via connector-specific options. When no
+ * post-processing pass applies, pushdown is unrestricted.

Review Comment:
   "pushdown" sounds unclear. Would be better to be specific:
   
   ```suggestion
    * but must not bypass it via connector-specific options. When no
    * post-processing pass applies, Spark does not impose any CDC-specific
    * predicate-pushdown restriction.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to