nsivabalan commented on issue #2656:
URL: https://github.com/apache/hudi/issues/2656#issuecomment-821757914


   I guess I understand what's happening. in COW, when creating a new data 
file, hudi reads existing data and merges w/ incoming data. From merging 
standpoint, partition path and record key pairs are considered unique. And so 
even if we insert the same batch again, new data file will not have duplicated 
data. One option is to create unique keys for every record as suggested by 
@pengzhiwei2018. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to