date:20210517

Re: hbase slack

2021-05-17 Thread Duo Zhang

Invitation sent. Please check.

Thanks.

Bryan Beaudreault  于2021年5月18日周二
上午10:21写道：

> That's weird! Can you invite bbeaudrea...@gmail.com?
>
> On Mon, May 17, 2021 at 9:39 PM 张铎(Duo Zhang) 
> wrote:
>
> > Could you please give us an email address?
> >
> > bbeaudrea...@hubspot.com.invalid
> >
> > Is this the expected one? There is an 'invalid' in it...
> >
> > Bryan Beaudreault  于2021年5月18日周二
> > 上午4:04写道：
> >
> > > Is there an existing user group slack of hbase users? If so can I have
> an
> > > invite?
> > >
> > > Thanks!
> > >
> >
>

Re: hbase slack

2021-05-17 Thread Bryan Beaudreault

That's weird! Can you invite bbeaudrea...@gmail.com?

On Mon, May 17, 2021 at 9:39 PM 张铎(Duo Zhang)  wrote:

> Could you please give us an email address?
>
> bbeaudrea...@hubspot.com.invalid
>
> Is this the expected one? There is an 'invalid' in it...
>
> Bryan Beaudreault  于2021年5月18日周二
> 上午4:04写道：
>
> > Is there an existing user group slack of hbase users? If so can I have an
> > invite?
> >
> > Thanks!
> >
>

Re: HotSpot detection/mitigation worker?

2021-05-17 Thread Mallikarjun

I think, no matter how good a balancer cost function be written, it cannot
cover for a not so optimal row key design. Say for example, you have 10
regionservers, 100 regions and your application is heavy on the latest data
which is mostly 1 or 2 regions, how many ever splits and/or merges it
becomes very hard to balance the load among the regionservers.

Here is how we have solved this problem among our clients. Which might not
work for existing clients, but can be a thought for new clients.

Every request with a row key goes through the enrichment process, which
prefixes with a hash (from say murmurhash) based on the client requested
distribution (this stays throughout the lifetime of that table for that
client). Also We wrote a hbase client abstraction to take care of this in a
seamless manager for our clients.

Example: Actual row key --> *0QUPHSBTLGM*, and client requested a 3 digit
prefix based on table region range (000 - 999), would translate to
*115-0QUPHSBTLGM* with murmurhash

---
Mallikarjun

On Tue, May 18, 2021 at 1:33 AM Bryan Beaudreault
 wrote:

> Hey all,
>
> We run a bunch of big hbase clusters that get used by hundreds of product
> teams for a variety of real-time workloads. We are a B2B company, so most
> data has a customerId somewhere in the rowkey. As the team that owns the
> hbase infrastructure, we try to help product teams properly design schemas
> to avoid hotspotting, but inevitably it happens. It may not necessarily
> just be hotspotting, but for example request volume may not be evenly
> distributed across all regions of a table.
>
> This hotspotting/distribution issue makes it hard for the balancer to keep
> the cluster balanced from a load perspective -- sure, all RS have the same
> number of regions, but those regions are not all created equal from a load
> perspective. This results in cases where one RS might be consistently at
> 70% cpu, another might be at 30%, and all the rest are in a band
> in-between.
>
> We already have a normalizer job which works similarly to the
> SimpleRegionNormalizer -- keeping regions approximately the same size from
> a data size perspective. I'm considering updating our normalizer to also
> take into account region load.
>
> My general plan is to follow a similar strategy to the balancer -- keep a
> configurable number of RegionLoad objects in memory per-region, and extract
> averages for readRequestsCount from those. If a region's average load is >
> some threshold relative to other regions in the same table, split it. If
> it's < some threshold relative to other regions in the same table, merge
> it.
>
> I'm writing because I'm wondering if anyone else has had this problem and
> if there exists prior art here. Is there a reason HBase does not provide a
> configurable load-based normalizer (beyond typical OSS reasons -- no one
> contributed it yet)?
>
> Thanks!
>

Re: hbase slack

2021-05-17 Thread Duo Zhang

Could you please give us an email address?

bbeaudrea...@hubspot.com.invalid

Is this the expected one? There is an 'invalid' in it...

Bryan Beaudreault  于2021年5月18日周二 上午4:04写道：

> Is there an existing user group slack of hbase users? If so can I have an
> invite?
>
> Thanks!
>

hbase slack

2021-05-17 Thread Bryan Beaudreault

Is there an existing user group slack of hbase users? If so can I have an
invite?

Thanks!

HotSpot detection/mitigation worker?

2021-05-17 Thread Bryan Beaudreault

Hey all,

We run a bunch of big hbase clusters that get used by hundreds of product
teams for a variety of real-time workloads. We are a B2B company, so most
data has a customerId somewhere in the rowkey. As the team that owns the
hbase infrastructure, we try to help product teams properly design schemas
to avoid hotspotting, but inevitably it happens. It may not necessarily
just be hotspotting, but for example request volume may not be evenly
distributed across all regions of a table.

This hotspotting/distribution issue makes it hard for the balancer to keep
the cluster balanced from a load perspective -- sure, all RS have the same
number of regions, but those regions are not all created equal from a load
perspective. This results in cases where one RS might be consistently at
70% cpu, another might be at 30%, and all the rest are in a band in-between.

We already have a normalizer job which works similarly to the
SimpleRegionNormalizer -- keeping regions approximately the same size from
a data size perspective. I'm considering updating our normalizer to also
take into account region load.

My general plan is to follow a similar strategy to the balancer -- keep a
configurable number of RegionLoad objects in memory per-region, and extract
averages for readRequestsCount from those. If a region's average load is >
some threshold relative to other regions in the same table, split it. If
it's < some threshold relative to other regions in the same table, merge it.

I'm writing because I'm wondering if anyone else has had this problem and
if there exists prior art here. Is there a reason HBase does not provide a
configurable load-based normalizer (beyond typical OSS reasons -- no one
contributed it yet)?

Thanks!

Re: [ANNOUNCE] New HBase Committer Xiaolin Ha(哈晓琳)

2021-05-17 Thread Nick Dimiduk

Congratulations, Xiaolin, and thank you for all your contributions!!

On Sat, May 15, 2021 at 7:11 AM 张铎(Duo Zhang)  wrote:

> On behalf of the Apache HBase PMC, I am pleased to announce that Xiaolin
> Ha(sunhelly) has accepted the PMC's invitation to become a committer on the
> project. We appreciate all of Xiaolin's generous contributions thus far and
> look forward to her continued involvement.
>
> Congratulations and welcome, Xiaolin Ha!
>
> 我很高兴代表Apache HBase PMC宣布哈晓琳已接受我们的邀请，成为Apache
> HBase项目的Committer。感谢哈晓琳一直以来为HBase项目做出的贡献，并期待她在未来继续承担更多的责任。
>
> 欢迎哈晓琳！
>

Re: hbase slack

Re: hbase slack

Re: HotSpot detection/mitigation worker?

Re: hbase slack

hbase slack

HotSpot detection/mitigation worker?

Re: [ANNOUNCE] New HBase Committer Xiaolin Ha(哈晓琳)

7 matches

Site Navigation

Mail list logo

Footer information