gengliangwang opened a new pull request, #55663:
URL: https://github.com/apache/spark/pull/55663

   ### What changes were proposed in this pull request?
   
   Tighten the CDC `Changelog` connector contract so that `_commit_version` 
must be either `LongType` or `StringType`. Previously any `AtomicType` was 
accepted, which left several edge-case types (`IntegerType`, `TimestampType`, 
`BinaryType`, `Decimal`, `Float`, `Double`, `Boolean`, ...) silently allowed.
   
   - `ChangelogTable.validateSchema` now rejects everything outside `LongType` 
/ `StringType` with a `BIGINT or STRING` expected-type message.
   - `Changelog` Javadoc updated to state the narrower contract and explain the 
ordering requirement (the netChanges post-processing path sorts rows by this 
column, so the column's natural ordering must match commit order).
   - `CdcNetChangesStatefulProcessor` ordering comment updated; the existing 
Catalyst-routed comparator is left in place for symmetry with the batch 
`SortOrder`.
   - `ChangelogResolutionSuite` updates: accept-list narrowed to `Long` / 
`String`; reject-list expanded to cover the previously-allowed atomic types 
(`Integer`, `Timestamp`) plus the existing complex-type cases.
   
   ### Why are the changes needed?
   
   `Long` (numeric monotonic version) and `String` (lexicographically ordered 
commit identifier) cover every realistic CDC source. The other atomic types are 
either strict subsets (`IntegerType` -> `LongType`) or duplicate the role of 
`_commit_timestamp` (`TimestampType`); types like `BinaryType` / `Float` / 
`Double` add NaN / boxing / ordering foot-guns with no expressive power gained. 
The narrower contract also lets the Javadoc state the ordering requirement 
precisely (matching what the netChanges code actually relies on).
   
   Locking down now is non-breaking (no external connectors yet) and keeps the 
documented surface area small. Relaxing later is non-breaking; restricting 
later is not.
   
   ### Does this PR introduce _any_ user-facing change?
   
   The `Changelog` connector API is `@Evolving` and has no external 
implementations yet; the restriction only narrows what implementers may return. 
No user-facing behavior change.
   
   ### How was this patch tested?
   
   - `ChangelogResolutionSuite` (27 tests) covers the new accept / reject 
matrix.
   - `ResolveChangelogTablePostProcessingSuite`, 
`ResolveChangelogTableStreamingPostProcessingSuite`, 
`ResolveChangelogTableNetChangesSuite`, `ChangelogEndToEndSuite` -- 130 
existing tests still pass on the new contract.
   - `UnsupportedOperationsSuite` (216 tests) still passes.
   - `Xdoclint:html,syntax,accessibility` is clean on `Changelog.java`; no new 
warnings under `Xdoclint:all`.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude opus-4-7


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to