tandonraghavs opened a new issue #2131:
URL: https://github.com/apache/hudi/issues/2131


   We are planning to use HUDI for our Data Warehouse, and we want to dump 
Mongo Data to S3.
   For Mongo we are relying on Oplogs(Debezium) . But as I experimented with 
Hudi, Hudi is not working if I have partial Data(oplogs kind of data).
   
   Is it recommended to have entire Row for Hudi?
   
   - I also tried to overwrite **HoodieRecordPayload** for 
**combineAndGetUpdateValue**, but that is also not fulfilling the use case as 
Compaction is running in Background then it only compares the recent record and 
stored record. So, few of the oplogs updates are getting lost.
   
   - There is another issue in Oplogs where we can have multiple Ids of same 
Document, then in this case I want to apply update using all the records, but 
due to preCombine it is taking updates of only latest record. 
   
   - Also, there is no option to set **hoodie.combine.before.upsert=false** 
while using DataSource.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to