[ https://issues.apache.org/jira/browse/HUDI-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raymond Xu reassigned HUDI-2496: -------------------------------- Assignee: Helias Antoniou > Inserts are precombined even with dedup disabled > ------------------------------------------------ > > Key: HUDI-2496 > URL: https://issues.apache.org/jira/browse/HUDI-2496 > Project: Apache Hudi > Issue Type: Bug > Components: Writer Core > Reporter: Sagar Sumit > Assignee: Helias Antoniou > Priority: Critical > Labels: sev:critical > Fix For: 0.10.0 > > > Original GH issue https://github.com/apache/hudi/issues/3709 > Test case by [~xushiyan] : [https://github.com/apache/hudi/pull/3723/files] > RCA by [~shivnarayan] : > Within HoodieMergeHandle, we use a hashmap to store incoming records, where > keys are record keys. > and so, if you see 1st batch, duplicates would remain intact. but wrt 2nd > batch, only unique records are considered and later concatenated w/ 1st batch. > > [https://github.com/apache/hudi/blob/36be28712196ff4427c41b0aa885c7fcd7356d7f/hudi-[]…]-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java -- This message was sent by Atlassian Jira (v8.3.4#803005)