qq726658006 opened a new pull request, #18262:
URL: https://github.com/apache/hudi/pull/18262

   ### Describe the issue this Pull Request addresses
   
   Based on this PR's fix, I conducted tests and found that even with the 
bucket index, there were still a few duplicate inserts, and they were 
concentrated in the last data. Therefore, I modified the comparison condition, 
and after the modification, this problem can be solved.
   https://github.com/apache/hudi/pull/18206
   
   ### Summary and Changelog
   
   Enhanced HoodieCDCFileSplit.compareTo:
   When instants are equal and inference case is LOG_FILE, use beforeFileSlice 
log-file count as a tie-breaker to preserve deterministic ordering for splits 
in the same instant.
   
   ### Impact
   
   Improves determinism of write/commit ordering for Flink CDC, especially MOR 
+ upsert.
   
   ### Risk Level
   
   low
   
   ### Documentation Update
   
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Enough context is provided in the sections above
   - [ ] Adequate tests were added if applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to