Re: [DISCUSS] RFC - 08 : Record level indexing mechanisms for Hudi datasets

Shiyan Xu Mon, 24 Feb 2020 18:37:01 -0800

+1 great reading and values!

On Mon, 24 Feb 2020, 15:31 nishith agarwal, <[email protected]> wrote:


> +100
> - Reduces index lookup time hence improves job runtime
> - Paves the way for streaming style ingestion
> - Eliminates dependency on Hbase (alternate "global index" support at the
> moment)
>
> -Nishith
>
> On Mon, Feb 24, 2020 at 10:56 AM Vinoth Chandar <[email protected]> wrote:
>
> > +1 from me as well. This will be a product defining feature, if we can do
> > it/
> >
> > On Sun, Feb 23, 2020 at 6:27 PM vino yang <[email protected]> wrote:
> >
> > > Hi Sivabalan,
> > >
> > > Thanks for your proposal.
> > >
> > > Big +1 from my side, indexing for record granularity is really good for
> > > performance. It is also towards the streaming processing.
> > >
> > > Best,
> > > Vino
> > >
> > > Sivabalan <[email protected]> 于2020年2月23日周日 上午12:52写道：
> > >
> > > > As Aapche Hudi is getting widely adopted, performance has become the
> > need
> > > > of the hour. This RFC focusses on improving performance of the Hudi
> > index
> > > > by introducing record level index. The proposal is to implement a new
> > > index
> > > > format that is a mapping of (recordKey <-> partition, fileId) or
> > > > ((recordKey, partitionPath) → fileId). This mapping will be stored
> and
> > > > maintained by Hudi as another implementation of HoodieIndex. This
> > record
> > > > level indexing will definitely give a boost to both read and write
> > > > performance.
> > > >
> > > > Here
> > > > <
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+08+%3A+Record+level+indexing+mechanisms+for+Hudi+datasets
> > > > >
> > > > is the link to RFC.
> > > >
> > > > Appreciate your review and thoughts.
> > > >
> > > > --
> > > > Regards,
> > > > -Sivabalan
> > > >
> > >
> >
>

Re: [DISCUSS] RFC - 08 : Record level indexing mechanisms for Hudi datasets

Reply via email to