to Raymond:  now combineAndGetUpdateValue can only return one
IndexedRecord, but in the case of SCD-2, both old and new records need to
be stored.
to Alexey: yeah,  this feature should be designed on top of RFC-46.  Can
HoodieRecordMerger return 2 HoodieRecord in this case?



On Tue, 25 Oct 2022 at 03:55, Alexey Kudinkin <akudin...@apache.org> wrote:

> Hey, hey, Fengjian!
>
> With the landing of the RFC-46 we'll be kick-starting a process of phasing
> out HoodieRecordPayload as an abstraction and instead migrating to
> HoodieRecordMerger interface.
> I'd recommend to base your design considerations off the new
> HoodieRecordMerger interface instead of legacy HoodieRecordPayload to make
> sure it's future-proof.
>
> On Thu, Oct 20, 2022 at 10:08 AM 冯健 <fengjian...@gmail.com> wrote:
>
> > Hi guys,
> >     After reading this article with respect to how to implement SCD-2
> with
> > Hudi Build Slowly Changing Dimensions Type 2 (SCD2) with Apache Spark and
> > Apache Hudi on Amazon EMR
> > <
> >
> https://aws.amazon.com/blogs/big-data/build-slowly-changing-dimensions-type-2-scd2-with-apache-spark-and-apache-hudi-on-amazon-emr/
> > >
> >     I have an idea about implementing embedded SCD-2 support in hudi by
> > using a new Payload. Users don't need to manually join the data, then
> > update end_data and status.
> >    For example, the record key is 'id,end_date',  Let's say the current
> > data's id is 1 and the end_date is 2099-12-31,  when a new record with
> id=1
> > arrives, it will update the current record's end_date to 2022-10-21, and
> > also insert this new record with end_data ' 2099-12-31'.  so this Payload
> > will generate two records in combineAndGetUpdateValue . there will be no
> > join cost, and the whole process is transparent to users.
> >
> >    Any thoughts?
> >
>

Reply via email to