[GitHub] [hudi] tandonraghavs opened a new issue #2131: HUDI with Mongo Oplogs (Debezium)

GitBox Tue, 29 Sep 2020 12:51:19 -0700


tandonraghavs opened a new issue #2131:
URL: https://github.com/apache/hudi/issues/2131



   We are planning to use HUDI for our Data Warehouse, and we want to dump 
Mongo Data to S3.
   For Mongo we are relying on Oplogs(Debezium) . But as I experimented with 
Hudi, Hudi is not working if I have partial Data(oplogs kind of data).
   
   Is it recommended to have entire Row for Hudi?
   
   - I also tried to overwrite **HoodieRecordPayload** for 
**combineAndGetUpdateValue**, but that is also not fulfilling the use case as 
Compaction is running in Background then it only compares the recent record and 
stored record. So, few of the oplogs updates are getting lost.
   
   - There is another issue in Oplogs where we can have multiple Ids of same 
Document, then in this case I want to apply update using all the records, but 
due to preCombine it is taking updates of only latest record. 
   
   - Also, there is no option to set **hoodie.combine.before.upsert=false** 
while using DataSource.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] tandonraghavs opened a new issue #2131: HUDI with Mongo Oplogs (Debezium)

Reply via email to