Hi guys,
    After reading this article with respect to how to implement SCD-2 with
Hudi Build Slowly Changing Dimensions Type 2 (SCD2) with Apache Spark and
Apache Hudi on Amazon EMR
<https://aws.amazon.com/blogs/big-data/build-slowly-changing-dimensions-type-2-scd2-with-apache-spark-and-apache-hudi-on-amazon-emr/>
    I have an idea about implementing embedded SCD-2 support in hudi by
using a new Payload. Users don't need to manually join the data, then
update end_data and status.
   For example, the record key is 'id,end_date',  Let's say the current
data's id is 1 and the end_date is 2099-12-31,  when a new record with id=1
arrives, it will update the current record's end_date to 2022-10-21, and
also insert this new record with end_data ' 2099-12-31'.  so this Payload
will generate two records in combineAndGetUpdateValue . there will be no
join cost, and the whole process is transparent to users.

   Any thoughts?

Reply via email to