Hi Jialun,
There is no outside documentation for this case except Javadocs 
(https://issues.apache.org/jira/browse/HUDI-1277).  The payload interface are 
themselves first class citizens of Hudi ( 
https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordPayload.java).
 
We will add a generic support for this case 
(https://issues.apache.org/jira/browse/HUDI-1278) . You can implement a 
specific implementation for your case or you can also contribute to HUDI-1278 
and I can work with you to get this landed.
Thanks,Balaji.V





    On Thursday, September 10, 2020, 11:05:44 AM PDT, Jialun Liu 
<[email protected]> wrote:  
 
 Hey Gray,

Thanks for replying so quickly!

Could you please point me to the documentation of this feature? I would
love to take a closer look at it, thanks!

Best regards,
Bill

On Thu, Sep 10, 2020 at 12:20 AM Gary Li <[email protected]> wrote:

> Hello.
> Yes this feature was supported by Hudi. You can write your own payload
> class to handle precombine(dedup within delta) and
> updateHistoryRecord(delta merge with history). The default payload is
> updateWithLatestRecord.
>
> Gary Li
> ________________________________
> From: Jialun Liu <[email protected]>
> Sent: Thursday, September 10, 2020 1:28:09 PM
> To: [email protected] <[email protected]>
> Subject: Apache Hudi Data Reconciliation
>
> Hey guys,
>
> I want to confirm if Apache Hudi has the capability of handling data
> reconciliation for use cases like late record, out of order records, retry
> etc.
>
> A simple example:
> @11:00
> RecordA, updatedAt = 11:00 (failed to update)
>
> @11:30
> RecordA, updatedAt = 11:30 (success)
>
> @12:00 (Retry the failed update)
> RecordA, updatedAt = 11:00 (should drop the record since it is stale)
>
> I know delta lake can update based on conditions so that I can use the
> updatedAt timestamp as the key. But how does Hudi do data reconciliation?
>
> Best regards,
> Bill
>
  

Reply via email to