Interesting thoughts. Not sure if I fully understand this part: "generate 2 records in combineAndGetUpdateValue". the API is defined to return just 1 record?
On Fri, Oct 21, 2022 at 1:07 AM 冯健 <[email protected]> wrote: > Hi guys, > After reading this article with respect to how to implement SCD-2 with > Hudi Build Slowly Changing Dimensions Type 2 (SCD2) with Apache Spark and > Apache Hudi on Amazon EMR > < > https://aws.amazon.com/blogs/big-data/build-slowly-changing-dimensions-type-2-scd2-with-apache-spark-and-apache-hudi-on-amazon-emr/ > > > I have an idea about implementing embedded SCD-2 support in hudi by > using a new Payload. Users don't need to manually join the data, then > update end_data and status. > For example, the record key is 'id,end_date', Let's say the current > data's id is 1 and the end_date is 2099-12-31, when a new record with id=1 > arrives, it will update the current record's end_date to 2022-10-21, and > also insert this new record with end_data ' 2099-12-31'. so this Payload > will generate two records in combineAndGetUpdateValue . there will be no > join cost, and the whole process is transparent to users. > > Any thoughts? > -- Best, Shiyan
