Re: [Phishing Risk] [External] is there solution to solve hbase data screw issue

Vinoth Chandar Thu, 14 Oct 2021 09:50:01 -0700

Yeah all the rate limiting code in HBaseIndex is working around for these
large bulk writes.


On Tue, Oct 5, 2021 at 11:16 AM Jian Feng <jian.f...@shopee.com> wrote:

> actually I met this problem when bootstrap a huge table，after changed
> region key split strategy，problem solved.
> Im glad to hear that hfile solution will work in the future，since
> bloomindex cannot index mor log file，hence new insert data still write into
> parquet ，that why I choose hbase index ，get better performance.
>
> Vinoth Chandar <vin...@apache.org>于2021年10月5日 周二下午7:29写道：
>
> > +1 on that answer. It's pretty spot on.
> >
> > Even as random prefix helps with HBase balancing, the issue then becomes
> > that you lose all the key ordering inside the Hudi table, which
> > can be a nice thing if you even want range pruning/indexing to be
> > effective.
> >
> > To paint a picture of all the work being done around this area. This
> work,
> > driven by uber engineers https://github.com/apache/hudi/pull/3508 could
> > technically solve the issue by directly reading HFiles
> > for the indexing, avoiding going to HBase servers. But obviously, it
> could
> > be less performant for small upsert batches than HBase (given the region
> > servers will cache etc).
> > If your backing storage is a cloud/object storage, which again throttles
> by
> > prefixes etc, then we could run into the same hotspotting problem again.
> > Otherwise, for larger batches, this would be far more scalable.
> >
> >
> > On Mon, Oct 4, 2021 at 7:06 PM 管梓越 <guanziyue....@bytedance.com> wrote:
> >
> > > Hi jianfeng
> > >       As far as I know, there may not be a solution in hudi side yet.
> > > However, I have met this problem before so hope my experience could
> help.
> > > Just like other usages of hbase, adding a random prefix to rowkey may
> be
> > > the most universal solution to this problem.
> > > We may change the primary key for hudi by adding such prefix before the
> > > data is ingested into hudi. A new column could be added to save
> original
> > > primary key for query and hide the pk of hudi.
> > > Also, we may have a small modification to hbase index. Copy the code of
> > > hbase index, add the prefix on the aspect of query and update hbase. By
> > > this way, the pk in hbase will be different with the one in hudi but
> such
> > > logic will be transparent to business logic. I have adopted this method
> > in
> > > prod environment. Using withIndexClass config in IndexConfig could
> > specify
> > > custom index which allows the change of index without re compilation of
> > the
> > > whole hudi project.
> > >
> > > On Mon, Oct 4, 2021, 11:29 PM <jian.f...@shopee.com> wrote:
> > > when I bootstrape a huge hbase index table, I found all keys have a
> > prefix
> > > 'itemid:', then it caused data skew, there are 100 region servers in
> > hbase
> > > but only one was handle datas Is there any way to avoid this issue on
> the
> > > Hudi side ? -- *Jian Feng,冯健* Shopee | Engineer | Data Infrastructure
> > >
> >
> --
> Full jian
> <Department> | <Function>
> Mobile <Mobile>
> Address <Office's Address>
>

Re: [Phishing Risk] [External] is there solution to solve hbase data screw issue

Reply via email to