nsivabalan opened a new issue, #18888: URL: https://github.com/apache/hudi/issues/18888
### Tips before filing an issue - [x] Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? ### Describe the problem you faced When `HoodieStreamer` ingests with `hoodie.write.table.version=6` and the source is one that funnels a `StreamerCheckpointV2` through `InputBatch#getCheckpointForNextBatch` (e.g. any `*DFSSource` via `DFSPathSelector`, Kafka via the `InputBatch(batch, String)` constructor, Kinesis, Pulsar, Jdbc, Sql, Hive, HoodieIncrSource, DatePartitionPathSelector, Debezium, S3EventsMetaSelector, GcsEventsSource), the persisted commit metadata carries the V2 key `streamer.checkpoint.key.v2` instead of the V1 key `deltastreamer.checkpoint.key`. This is a contract violation: `CheckpointUtils.shouldTargetCheckpointV2(writeTableVersion, sourceClassName)` returns `false` for `writeTableVersion < 8`, so V1 is required for v6 tables. ### To Reproduce Steps to reproduce the behavior: 1. Use HoodieStreamer with `--source-class org.apache.hudi.utilities.sources.ParquetDFSSource` 2. Pass `--hoodie-conf hoodie.write.table.version=6` 3. After a sync produces a commit, inspect `extraMetadata` in the `.commit` file 4. Observe `streamer.checkpoint.key.v2=<value>` is present and `deltastreamer.checkpoint.key` is absent ### Expected behavior For `hoodie.write.table.version=6`, the persisted commit metadata should carry the V1 key (`deltastreamer.checkpoint.key`) and not the V2 key (`streamer.checkpoint.key.v2`). ### Environment Description - Hudi version: master (1.x line) - Spark version: 3.5 - Hive version: n/a - Hadoop version: n/a - Storage (HDFS/S3/GCS..): any - Running on Docker?: no ### Additional context Root cause: `DFSPathSelector.getNextFilePathsAndMaxModificationTime` (lines 154 & 160) and similar helpers in other sources unconditionally construct `new StreamerCheckpointV2(...)` without checking the target table version. `StreamSync.extractCheckpointMetadata` then trusts that type and writes V2 keys regardless of write table version. The branch two lines below for the no-batch fallback already calls `buildCheckpointFromGeneralSource` and gets the version contract right — so this is a missed branch in an existing chokepoint, not a missing primitive. PR: WIP — will link. ### Stacktrace n/a — wrong-key persistence, not a thrown exception. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
