Hey Balaji, Thanks for your help!
I am new to Apache Hudi and am slowly exploring things. I will reach out to you if I have something that could contribute back to the community. Best regards, Bill On Sat, Sep 12, 2020 at 3:51 PM Balaji Varadarajan <[email protected]> wrote: > > Hi Jialun, > There is no outside documentation for this case except Javadocs ( > https://issues.apache.org/jira/browse/HUDI-1277). The payload interface > are themselves first class citizens of Hudi ( > https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordPayload.java > ). > We will add a generic support for this case ( > https://issues.apache.org/jira/browse/HUDI-1278) . You can implement a > specific implementation for your case or you can also contribute to > HUDI-1278 and I can work with you to get this landed. > Thanks,Balaji.V > > > > > > On Thursday, September 10, 2020, 11:05:44 AM PDT, Jialun Liu < > [email protected]> wrote: > > Hey Gray, > > Thanks for replying so quickly! > > Could you please point me to the documentation of this feature? I would > love to take a closer look at it, thanks! > > Best regards, > Bill > > On Thu, Sep 10, 2020 at 12:20 AM Gary Li <[email protected]> wrote: > > > Hello. > > Yes this feature was supported by Hudi. You can write your own payload > > class to handle precombine(dedup within delta) and > > updateHistoryRecord(delta merge with history). The default payload is > > updateWithLatestRecord. > > > > Gary Li > > ________________________________ > > From: Jialun Liu <[email protected]> > > Sent: Thursday, September 10, 2020 1:28:09 PM > > To: [email protected] <[email protected]> > > Subject: Apache Hudi Data Reconciliation > > > > Hey guys, > > > > I want to confirm if Apache Hudi has the capability of handling data > > reconciliation for use cases like late record, out of order records, > retry > > etc. > > > > A simple example: > > @11:00 > > RecordA, updatedAt = 11:00 (failed to update) > > > > @11:30 > > RecordA, updatedAt = 11:30 (success) > > > > @12:00 (Retry the failed update) > > RecordA, updatedAt = 11:00 (should drop the record since it is stale) > > > > I know delta lake can update based on conditions so that I can use the > > updatedAt timestamp as the key. But how does Hudi do data reconciliation? > > > > Best regards, > > Bill > > >
