Hello.
Yes this feature was supported by Hudi. You can write your own payload class to 
handle precombine(dedup within delta) and updateHistoryRecord(delta merge with 
history). The default payload is updateWithLatestRecord.

Gary Li
________________________________
From: Jialun Liu <[email protected]>
Sent: Thursday, September 10, 2020 1:28:09 PM
To: [email protected] <[email protected]>
Subject: Apache Hudi Data Reconciliation

Hey guys,

I want to confirm if Apache Hudi has the capability of handling data
reconciliation for use cases like late record, out of order records, retry
etc.

A simple example:
@11:00
RecordA, updatedAt = 11:00 (failed to update)

@11:30
RecordA, updatedAt = 11:30 (success)

@12:00 (Retry the failed update)
RecordA, updatedAt = 11:00 (should drop the record since it is stale)

I know delta lake can update based on conditions so that I can use the
updatedAt timestamp as the key. But how does Hudi do data reconciliation?

Best regards,
Bill

Reply via email to