Re: [DISCUSS] RFC - 08 : Record level indexing mechanisms for Hudi datasets

vino yang Sun, 23 Feb 2020 18:27:55 -0800

Hi Sivabalan,

Thanks for your proposal.


Big +1 from my side, indexing for record granularity is really good for
performance. It is also towards the streaming processing.

Best,
Vino

Sivabalan <n.siv...@gmail.com> 于2020年2月23日周日 上午12:52写道：

> As Aapche Hudi is getting widely adopted, performance has become the need
> of the hour. This RFC focusses on improving performance of the Hudi index
> by introducing record level index. The proposal is to implement a new index
> format that is a mapping of (recordKey <-> partition, fileId) or
> ((recordKey, partitionPath) → fileId). This mapping will be stored and
> maintained by Hudi as another implementation of HoodieIndex. This record
> level indexing will definitely give a boost to both read and write
> performance.
>
> Here
> <
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+08+%3A+Record+level+indexing+mechanisms+for+Hudi+datasets
> >
> is the link to RFC.
>
> Appreciate your review and thoughts.
>
> --
> Regards,
> -Sivabalan
>

Re: [DISCUSS] RFC - 08 : Record level indexing mechanisms for Hudi datasets

Reply via email to