Hello Bill, please check the code here 
https://github.com/apache/hudi/blob/a1cff8abae9d9dab87d439c90451da73cc71eebf/hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java

Get Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: Jialun Liu <[email protected]>
Sent: Friday, September 11, 2020 2:05:29 AM
To: [email protected] <[email protected]>
Subject: Re: Apache Hudi Data Reconciliation

Hey Gray,

Thanks for replying so quickly!

Could you please point me to the documentation of this feature? I would
love to take a closer look at it, thanks!

Best regards,
Bill

On Thu, Sep 10, 2020 at 12:20 AM Gary Li <[email protected]> wrote:

> Hello.
> Yes this feature was supported by Hudi. You can write your own payload
> class to handle precombine(dedup within delta) and
> updateHistoryRecord(delta merge with history). The default payload is
> updateWithLatestRecord.
>
> Gary Li
> ________________________________
> From: Jialun Liu <[email protected]>
> Sent: Thursday, September 10, 2020 1:28:09 PM
> To: [email protected] <[email protected]>
> Subject: Apache Hudi Data Reconciliation
>
> Hey guys,
>
> I want to confirm if Apache Hudi has the capability of handling data
> reconciliation for use cases like late record, out of order records, retry
> etc.
>
> A simple example:
> @11:00
> RecordA, updatedAt = 11:00 (failed to update)
>
> @11:30
> RecordA, updatedAt = 11:30 (success)
>
> @12:00 (Retry the failed update)
> RecordA, updatedAt = 11:00 (should drop the record since it is stale)
>
> I know delta lake can update based on conditions so that I can use the
> updatedAt timestamp as the key. But how does Hudi do data reconciliation?
>
> Best regards,
> Bill
>

Reply via email to