With Apache Hudi growing in popularity, one of the fundamental challenges for users has been about efficiently migrating their historical datasets to Apache Hudi. Apache Hudi maintains per record metadata to perform core operations such as upserts and incremental pull. To take advantage of Hudi’s upsert and incremental processing support, users would need to rewrite their whole dataset to make it a Hudi table. This RFC provides a mechanism to efficiently migrate their datasets without the need to rewrite the entire dataset.
Please find the link for the RFC below. https://cwiki.apache.org/confluence/display/HUDI/RFC+-+12+%3A+Efficient+Migration+of+Large+Parquet+Tables+to+Apache+Hudi Please review and let me know your thoughts. Thanks, Balaji.V
