Re:Re: [DISCUSS] RFC-12 : Efficient migration of large parquet tables to Apache Hudi

蒋晓峰 Sat, 14 Dec 2019 19:04:54 -0800

Hi Balaji,
About plan of "Efficient migration of large parquet tables to Apache Hudi", 
have you split the plan into multiple subtasks？
Thanks,
Nicholas



At 2019-12-14 00:18:12, "Vinoth Chandar" <[email protected]> wrote:
>+1 (per asf policy)
>
>+100 per my own excitement :) .. Happy to review this!
>
>On Fri, Dec 13, 2019 at 3:07 AM Balaji Varadarajan <[email protected]>
>wrote:
>
>> With Apache Hudi growing in popularity, one of the fundamental challenges
>> for users has been about efficiently migrating their historical datasets to
>> Apache Hudi. Apache Hudi maintains per record metadata to perform core
>> operations such as upserts and incremental pull. To take advantage of
>> Hudi’s upsert and incremental processing support, users would need to
>> rewrite their whole dataset to make it a Hudi table. This RFC provides a
>> mechanism to efficiently migrate their datasets without the need to rewrite
>> the entire dataset.
>>
>>  Please find the link for the RFC below.
>>
>>
>> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+12+%3A+Efficient+Migration+of+Large+Parquet+Tables+to+Apache+Hudi
>>
>> Please review and let me know your thoughts.
>>
>> Thanks,
>> Balaji.V
>>

Re:Re: [DISCUSS] RFC-12 : Efficient migration of large parquet tables to Apache Hudi

Reply via email to