nsivabalan commented on issue #2656: URL: https://github.com/apache/hudi/issues/2656#issuecomment-821757914
I guess I understand what's happening. in COW, when creating a new data file, hudi reads existing data and merges w/ incoming data. From merging standpoint, partition path and record key pairs are considered unique. And so even if we insert the same batch again, new data file will not have duplicated data. One option is to create unique keys for every record as suggested by @pengzhiwei2018. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org