[DISCUSS] RFC-12 : Efficient migration of large parquet tables to Apache Hudi

Balaji Varadarajan Fri, 13 Dec 2019 03:07:46 -0800

With Apache Hudi growing in popularity, one of the fundamental challenges
for users has been about efficiently migrating their historical datasets to
Apache Hudi. Apache Hudi maintains per record metadata to perform core
operations such as upserts and incremental pull. To take advantage of
Hudi’s upsert and incremental processing support, users would need to
rewrite their whole dataset to make it a Hudi table. This RFC provides a
mechanism to efficiently migrate their datasets without the need to rewrite
the entire dataset.


 Please find the link for the RFC below.

https://cwiki.apache.org/confluence/display/HUDI/RFC+-+12+%3A+Efficient+Migration+of+Large+Parquet+Tables+to+Apache+Hudi

Please review and let me know your thoughts.

Thanks,
Balaji.V

[DISCUSS] RFC-12 : Efficient migration of large parquet tables to Apache Hudi

Reply via email to