yihua opened a new pull request, #18896:
URL: https://github.com/apache/hudi/pull/18896

   ### Describe the issue this Pull Request addresses
   
   For Hudi table version 6, several streamer sources hardcoded `new 
StreamerCheckpointV2(...)` and emitted v2 checkpoint keys in commit metadata 
regardless of the configured write table version (`KafkaSource` was the 
original repro).
   
   ### Summary and Changelog
   
   This PR routes every affected source (`KafkaSource`, `KinesisSource`, 
`JdbcSource`, `SqlFileBasedSource`, `PulsarSource`, `DebeziumSource`, 
`GcsEventsSource`, `HiveIncrPullSource`) and selector helper 
(`DFSPathSelector`, `DatePartitionPathSelector`, `S3EventsMetaSelector`) 
through `CheckpointUtils.createCheckpoint(writeTableVersion, key)` so the 
emitted checkpoint class is determined solely by `WRITE_TABLE_VERSION` (v6 → 
v1, v8 → v2). Removed now-dead helpers (`Source#assertCheckpointVersion`, 
`InputBatch(Option, String, ...)` ctors).
   
   Incremental sources (`HoodieIncrSource`, `S3EventsHoodieIncrSource`, 
`GcsEventsHoodieIncrSource`) are not affected — they have their own checkpoint 
construction logic and already route through 
`CheckpointUtils.buildCheckpointFromGeneralSource` / the 
`DATASOURCES_NOT_SUPPORTED_WITH_CKPT_V2` allowlist.
   
   ### Impact
   
   Streamer commit metadata for table version 6 now correctly emits v1 
checkpoint keys (`deltastreamer.checkpoint.key`) instead of v2 keys 
(`streamer.checkpoint.key.v2`). No public API change.
   
   ### Risk Level
   
   medium: touches the checkpoint write path in every streamer source. Added 
`TestStreamerSourceCheckpointVersion`, which exercises each source class across 
{v6, v8} write table versions and {V1, V2} input checkpoints and asserts the 
returned checkpoint class is determined solely by the write table version. 
Added a `testCreateCheckpoint` parameterized test for 
`CheckpointUtils.createCheckpoint` in `TestCheckpointUtils`.
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Enough context is provided in the sections above
   - [ ] Adequate tests were added if applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to