qq726658006 opened a new pull request, #18262: URL: https://github.com/apache/hudi/pull/18262
### Describe the issue this Pull Request addresses Based on this PR's fix, I conducted tests and found that even with the bucket index, there were still a few duplicate inserts, and they were concentrated in the last data. Therefore, I modified the comparison condition, and after the modification, this problem can be solved. https://github.com/apache/hudi/pull/18206 ### Summary and Changelog Enhanced HoodieCDCFileSplit.compareTo: When instants are equal and inference case is LOG_FILE, use beforeFileSlice log-file count as a tie-breaker to preserve deterministic ordering for splits in the same instant. ### Impact Improves determinism of write/commit ordering for Flink CDC, especially MOR + upsert. ### Risk Level low ### Documentation Update ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Enough context is provided in the sections above - [ ] Adequate tests were added if applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
