[ 
https://issues.apache.org/jira/browse/HUDI-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2496:
-----------------------------
    Description: 
Original GH issue https://github.com/apache/hudi/issues/3709

Test case by [~xushiyan] : [https://github.com/apache/hudi/pull/3723/files]

RCA by [~shivnarayan] :

Within HoodieMergeHandle, we use a hashmap to store incoming records, where 
keys are record keys.
 and so, if you see 1st batch, duplicates would remain intact. but wrt 2nd 
batch, only unique records are considered and later concatenated w/ 1st batch.
 
[https://github.com/apache/hudi/blob/36be28712196ff4427c41b0aa885c7fcd7356d7f/hudi-[]…]-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java

  was:
Test case by [~xushiyan] : https://github.com/apache/hudi/pull/3723/files

RCA by [~shivnarayan] :

Within HoodieMergeHandle, we use a hashmap to store incoming records, where 
keys are record keys.
and so, if you see 1st batch, duplicates would remain intact. but wrt 2nd 
batch, only unique records are considered and later concatenated w/ 1st batch.
https://github.com/apache/hudi/blob/36be28712196ff4427c41b0aa885c7fcd7356d7f/hudi-[…]-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java



> Inserts are precombined even with dedup disabled
> ------------------------------------------------
>
>                 Key: HUDI-2496
>                 URL: https://issues.apache.org/jira/browse/HUDI-2496
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Writer Core
>            Reporter: Sagar Sumit
>            Priority: Major
>              Labels: sev:critical
>             Fix For: 0.10.0
>
>
> Original GH issue https://github.com/apache/hudi/issues/3709
> Test case by [~xushiyan] : [https://github.com/apache/hudi/pull/3723/files]
> RCA by [~shivnarayan] :
> Within HoodieMergeHandle, we use a hashmap to store incoming records, where 
> keys are record keys.
>  and so, if you see 1st batch, duplicates would remain intact. but wrt 2nd 
> batch, only unique records are considered and later concatenated w/ 1st batch.
>  
> [https://github.com/apache/hudi/blob/36be28712196ff4427c41b0aa885c7fcd7356d7f/hudi-[]…]-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to