viirya commented on code in PR #55776: URL: https://github.com/apache/spark/pull/55776#discussion_r3220626458
########## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/Changelog.java: ########## @@ -71,10 +71,21 @@ * </ul> * <p> * Streaming reads support carry-over removal, update detection, and net change - * computation. Net change collapses are kept in the state store keyed by row identity; - * row identities only touched in the latest observed commit are held back until either a - * later commit (with strictly greater `_commit_timestamp`) advances the global watermark - * past them, or the source terminates. + * computation. Two streaming-specific behaviors to be aware of: + * <ul> + * <li><b>Output is delayed by one commit.</b> When a micro-batch ingests a Review Comment: Output is delayed until a later micro-batch advances the watermark. "Output is delayed by one commit" is not accurate. The delaying is from streaming watermark/stateful append semantics. The contract actually allows a micro-batch has multiple distinct commits. Earlier commit in a batch won't be output because other commit in the same batch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
