Hi Balaji, About plan of "Efficient migration of large parquet tables to Apache Hudi", have you split the plan into multiple subtasks? Thanks, Nicholas
At 2019-12-14 00:18:12, "Vinoth Chandar" <[email protected]> wrote: >+1 (per asf policy) > >+100 per my own excitement :) .. Happy to review this! > >On Fri, Dec 13, 2019 at 3:07 AM Balaji Varadarajan <[email protected]> >wrote: > >> With Apache Hudi growing in popularity, one of the fundamental challenges >> for users has been about efficiently migrating their historical datasets to >> Apache Hudi. Apache Hudi maintains per record metadata to perform core >> operations such as upserts and incremental pull. To take advantage of >> Hudi’s upsert and incremental processing support, users would need to >> rewrite their whole dataset to make it a Hudi table. This RFC provides a >> mechanism to efficiently migrate their datasets without the need to rewrite >> the entire dataset. >> >> Please find the link for the RFC below. >> >> >> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+12+%3A+Efficient+Migration+of+Large+Parquet+Tables+to+Apache+Hudi >> >> Please review and let me know your thoughts. >> >> Thanks, >> Balaji.V >>
