pratyakshsharma commented on pull request #1558:
URL: https://github.com/apache/incubator-hudi/pull/1558#issuecomment-621253677


   > @pratyakshsharma : Can we do two kinds of de-duplicate together? First, 
de-duplicate by different commits, then de-duplicate same commit, they are not 
incompatible, we can remove all the duplicate data at once.
   
   Yes it is possible to do both types of deduplicate together. But I feel in 
most of the cases, only one of the 2 possible cases (de-duplicate by different 
commits or de-duplicate same commit) would be needed for doing deduping. In 
essence, that would mean one of the 2 flows would be running unnecessarily. 
That was the sole reason I introduced the boolean useCommitTimeForDedupe.
   So I prefer to have the code flow in its current form. If you strongly feel, 
I would do the changes that you are suggesting. Please let me know your 
thoughts on this. :) 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to