[ https://issues.apache.org/jira/browse/HUDI-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raymond Xu updated HUDI-2496: ----------------------------- Description: Original GH issue https://github.com/apache/hudi/issues/3709 Test case by [~xushiyan] : [https://github.com/apache/hudi/pull/3723/files] RCA by [~shivnarayan] : Within HoodieMergeHandle, we use a hashmap to store incoming records, where keys are record keys. and so, if you see 1st batch, duplicates would remain intact. but wrt 2nd batch, only unique records are considered and later concatenated w/ 1st batch. [https://github.com/apache/hudi/blob/36be28712196ff4427c41b0aa885c7fcd7356d7f/hudi-[]…]-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java was: Test case by [~xushiyan] : https://github.com/apache/hudi/pull/3723/files RCA by [~shivnarayan] : Within HoodieMergeHandle, we use a hashmap to store incoming records, where keys are record keys. and so, if you see 1st batch, duplicates would remain intact. but wrt 2nd batch, only unique records are considered and later concatenated w/ 1st batch. https://github.com/apache/hudi/blob/36be28712196ff4427c41b0aa885c7fcd7356d7f/hudi-[…]-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java > Inserts are precombined even with dedup disabled > ------------------------------------------------ > > Key: HUDI-2496 > URL: https://issues.apache.org/jira/browse/HUDI-2496 > Project: Apache Hudi > Issue Type: Bug > Components: Writer Core > Reporter: Sagar Sumit > Priority: Major > Labels: sev:critical > Fix For: 0.10.0 > > > Original GH issue https://github.com/apache/hudi/issues/3709 > Test case by [~xushiyan] : [https://github.com/apache/hudi/pull/3723/files] > RCA by [~shivnarayan] : > Within HoodieMergeHandle, we use a hashmap to store incoming records, where > keys are record keys. > and so, if you see 1st batch, duplicates would remain intact. but wrt 2nd > batch, only unique records are considered and later concatenated w/ 1st batch. > > [https://github.com/apache/hudi/blob/36be28712196ff4427c41b0aa885c7fcd7356d7f/hudi-[]…]-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java -- This message was sent by Atlassian Jira (v8.3.4#803005)