[ 
https://issues.apache.org/jira/browse/HUDI-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2496:
-----------------------------
    Fix Version/s: 0.10.0

> Inserts are precombined even with dedup disabled
> ------------------------------------------------
>
>                 Key: HUDI-2496
>                 URL: https://issues.apache.org/jira/browse/HUDI-2496
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Writer Core
>            Reporter: Sagar Sumit
>            Priority: Major
>              Labels: sev:critical
>             Fix For: 0.10.0
>
>
> Test case by [~xushiyan] : https://github.com/apache/hudi/pull/3723/files
> RCA by [~shivnarayan] :
> Within HoodieMergeHandle, we use a hashmap to store incoming records, where 
> keys are record keys.
> and so, if you see 1st batch, duplicates would remain intact. but wrt 2nd 
> batch, only unique records are considered and later concatenated w/ 1st batch.
> https://github.com/apache/hudi/blob/36be28712196ff4427c41b0aa885c7fcd7356d7f/hudi-[…]-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to