viirya commented on code in PR #55664:
URL: https://github.com/apache/spark/pull/55664#discussion_r3179723403
##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/Changelog.java:
##########
@@ -75,6 +75,23 @@
* row identities only touched in the latest observed commit are held back
until either a
* later commit (with strictly greater `_commit_timestamp`) advances the
global watermark
* past them, or the source terminates.
+ * <p>
+ * <b>Pushdown contract.</b> When any post-processing pass applies (carry-over
+ * removal, update detection, or netChanges), Spark only pushes predicates
+ * that reference {@code _commit_version}, {@code _commit_timestamp}, or
+ * columns named by {@link #rowId()} to the connector's
+ * {@link org.apache.spark.sql.connector.read.SupportsPushDownFilters} /
+ * {@link org.apache.spark.sql.connector.read.SupportsPushDownV2Filters}.
+ * Predicates on {@code _change_type}, the {@link #rowVersion()} column, or
+ * data columns are kept above the scan: pushing them would drop one half of
Review Comment:
Isn't rowId data column? Maybe non-rowId data columns here?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]