davehagman commented on pull request #3820:
URL: https://github.com/apache/hudi/pull/3820#issuecomment-946035803


   I like that idea a lot. It reduces the chance of error as well. Here are 
some thoughts:
   
   > a new config called `hoodie.copy.over.deltastreamer.checkpoints`
   
   Since this is very specific to multi-writer/OCC what about putting it under 
the `concurrency` namespace? Something like 
`hoodie.write.concurrency.merge.deltastreamer.state`. This also removes the 
implementation detail of "checkpoint" in favor of a generalized "state" which 
will allow us to extend this to other keys in the future if necessary without 
needing more configs. 
   
   > fetch value of "deltastreamer.checkpoint.key" from last committed 
transaction and copy to cur inflight commit extra metadata.
   
   Yea we can even re-use the existing code (still need my fix) that merges a 
key from the previous instant's metadata to the inflight (current) one. Now we 
will just make this access private and only expose a new method which is 
specific to copying over checkpoint state if the above config is set. Something 
like:
   `TransactionUtils.mergeCheckpointStateFromPreviousCommit(thisInstant, 
previousCommit)`
    
   this will ultimately just call the existing 
`overrideWithLatestCommitMetadata` (now private) specifically with the metadata 
key `deltastreamer.checkpoint.key`, successfully abstracting details and 
removing the need for users to know anything about the internal state of 
commits.
   
   Thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to