gengliangwang opened a new pull request, #55776: URL: https://github.com/apache/spark/pull/55776
### What changes were proposed in this pull request? Address two follow-up review threads on PR #55637 (streaming CDC netChanges) by clarifying the streaming behavior in the `Changelog` Javadoc. The previous paragraph read as if the one-commit emission lag were a netChanges-specific property; in fact carry-over removal and update detection use append-mode `Aggregate` keyed on `_commit_timestamp` and have the same lag as the netChanges `transformWithState` timer. The paragraph also did not set expectations for what streaming netChanges actually collapses in practice. Replaced the existing single paragraph with a bulleted list: - **Output is delayed by one commit.** When a micro-batch ingests a commit, that commit's output rows are buffered and not emitted in the same batch. They are emitted by the next micro-batch -- the one that ingests the following commit. The last commit's output is emitted when the source terminates. - **netChanges only merges changes that are buffered together.** For a typical CDC source that produces at most one change per row per commit, only one commit's changes are buffered at a time per row, so the streaming output is the same as `computeUpdates`. Multiple commits' changes are merged only when those commits touch the same row before the older one's output has been emitted. For full-range collapse, use a batch read. This is a sub-task of SPARK-55668. ### Why are the changes needed? Spelling out the emission timing and the practical netChanges scope prevents adopters from forming wrong expectations about what streaming netChanges does for typical (atomic-commit) CDC workloads. Naming the lag and the buffer-window scope explicitly also makes the doc consistent with the implementation, where both facts are properties of all three streaming post-processing paths. ### Does this PR introduce _any_ user-facing change? Documentation only. No behavior change. ### How was this patch tested? Doc-only change. `Xdoclint:html,syntax,accessibility` is clean on `Changelog.java` (errors limited to expected "cannot find symbol" without classpath). No code changed; existing CDC test suites unaffected. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude opus-4-7 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
