to Raymond: now combineAndGetUpdateValue can only return one IndexedRecord, but in the case of SCD-2, both old and new records need to be stored. to Alexey: yeah, this feature should be designed on top of RFC-46. Can HoodieRecordMerger return 2 HoodieRecord in this case?
On Tue, 25 Oct 2022 at 03:55, Alexey Kudinkin <akudin...@apache.org> wrote: > Hey, hey, Fengjian! > > With the landing of the RFC-46 we'll be kick-starting a process of phasing > out HoodieRecordPayload as an abstraction and instead migrating to > HoodieRecordMerger interface. > I'd recommend to base your design considerations off the new > HoodieRecordMerger interface instead of legacy HoodieRecordPayload to make > sure it's future-proof. > > On Thu, Oct 20, 2022 at 10:08 AM 冯健 <fengjian...@gmail.com> wrote: > > > Hi guys, > > After reading this article with respect to how to implement SCD-2 > with > > Hudi Build Slowly Changing Dimensions Type 2 (SCD2) with Apache Spark and > > Apache Hudi on Amazon EMR > > < > > > https://aws.amazon.com/blogs/big-data/build-slowly-changing-dimensions-type-2-scd2-with-apache-spark-and-apache-hudi-on-amazon-emr/ > > > > > I have an idea about implementing embedded SCD-2 support in hudi by > > using a new Payload. Users don't need to manually join the data, then > > update end_data and status. > > For example, the record key is 'id,end_date', Let's say the current > > data's id is 1 and the end_date is 2099-12-31, when a new record with > id=1 > > arrives, it will update the current record's end_date to 2022-10-21, and > > also insert this new record with end_data ' 2099-12-31'. so this Payload > > will generate two records in combineAndGetUpdateValue . there will be no > > join cost, and the whole process is transparent to users. > > > > Any thoughts? > > >