[ 
https://issues.apache.org/jira/browse/HUDI-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17394568#comment-17394568
 ] 

ASF GitHub Bot commented on HUDI-2170:
--------------------------------------

danny0405 commented on a change in pull request #3401:
URL: https://github.com/apache/hudi/pull/3401#discussion_r683992963



##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
##########
@@ -239,7 +240,14 @@ protected void init(String fileId, 
Iterator<HoodieRecord<T>> newRecordsItr) {
         record.seal();
       }
       // NOTE: Once Records are added to map (spillable-map), DO NOT change it 
as they won't persist
-      keyToNewRecords.put(record.getRecordKey(), record);
+      String key = record.getRecordKey();
+      if (keyToNewRecords.containsKey(key)) {

Review comment:
       Yeah, we can avoid this because the deduplication can be executed before 
entering this handle, but `precombine` is not by default true, take a look at 
Spark option `DataSourceOptions.INSERT_DROP_DUPS`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Always choose the latest record for HoodieRecordPayload
> -------------------------------------------------------
>
>                 Key: HUDI-2170
>                 URL: https://issues.apache.org/jira/browse/HUDI-2170
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: Common Core
>            Reporter: Danny Chen
>            Assignee: Danny Chen
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>
> Now in {{OverwriteWithLatestAvroPayload.preCombine}}, we still choose the old 
> record when the new record has the same preCombine field with the old one, 
> actually it is more natural to keep the new incoming record instead. The 
> {{DefaultHoodieRecordPayload.combineAndGetUpdateValue}} method already did 
> that.
> See issue: https://github.com/apache/hudi/issues/3266.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to