Re: hbase slack
Invitation sent. Please check. Thanks. Bryan Beaudreault 于2021年5月18日周二 上午10:21写道: > That's weird! Can you invite bbeaudrea...@gmail.com? > > On Mon, May 17, 2021 at 9:39 PM 张铎(Duo Zhang) > wrote: > > > Could you please give us an email address? > > > > bbeaudrea...@hubspot.com.invalid > > > > Is this the expected one? There is an 'invalid' in it... > > > > Bryan Beaudreault 于2021年5月18日周二 > > 上午4:04写道: > > > > > Is there an existing user group slack of hbase users? If so can I have > an > > > invite? > > > > > > Thanks! > > > > > >
Re: hbase slack
That's weird! Can you invite bbeaudrea...@gmail.com? On Mon, May 17, 2021 at 9:39 PM 张铎(Duo Zhang) wrote: > Could you please give us an email address? > > bbeaudrea...@hubspot.com.invalid > > Is this the expected one? There is an 'invalid' in it... > > Bryan Beaudreault 于2021年5月18日周二 > 上午4:04写道: > > > Is there an existing user group slack of hbase users? If so can I have an > > invite? > > > > Thanks! > > >
Re: HotSpot detection/mitigation worker?
I think, no matter how good a balancer cost function be written, it cannot cover for a not so optimal row key design. Say for example, you have 10 regionservers, 100 regions and your application is heavy on the latest data which is mostly 1 or 2 regions, how many ever splits and/or merges it becomes very hard to balance the load among the regionservers. Here is how we have solved this problem among our clients. Which might not work for existing clients, but can be a thought for new clients. Every request with a row key goes through the enrichment process, which prefixes with a hash (from say murmurhash) based on the client requested distribution (this stays throughout the lifetime of that table for that client). Also We wrote a hbase client abstraction to take care of this in a seamless manager for our clients. Example: Actual row key --> *0QUPHSBTLGM*, and client requested a 3 digit prefix based on table region range (000 - 999), would translate to *115-0QUPHSBTLGM* with murmurhash --- Mallikarjun On Tue, May 18, 2021 at 1:33 AM Bryan Beaudreault wrote: > Hey all, > > We run a bunch of big hbase clusters that get used by hundreds of product > teams for a variety of real-time workloads. We are a B2B company, so most > data has a customerId somewhere in the rowkey. As the team that owns the > hbase infrastructure, we try to help product teams properly design schemas > to avoid hotspotting, but inevitably it happens. It may not necessarily > just be hotspotting, but for example request volume may not be evenly > distributed across all regions of a table. > > This hotspotting/distribution issue makes it hard for the balancer to keep > the cluster balanced from a load perspective -- sure, all RS have the same > number of regions, but those regions are not all created equal from a load > perspective. This results in cases where one RS might be consistently at > 70% cpu, another might be at 30%, and all the rest are in a band > in-between. > > We already have a normalizer job which works similarly to the > SimpleRegionNormalizer -- keeping regions approximately the same size from > a data size perspective. I'm considering updating our normalizer to also > take into account region load. > > My general plan is to follow a similar strategy to the balancer -- keep a > configurable number of RegionLoad objects in memory per-region, and extract > averages for readRequestsCount from those. If a region's average load is > > some threshold relative to other regions in the same table, split it. If > it's < some threshold relative to other regions in the same table, merge > it. > > I'm writing because I'm wondering if anyone else has had this problem and > if there exists prior art here. Is there a reason HBase does not provide a > configurable load-based normalizer (beyond typical OSS reasons -- no one > contributed it yet)? > > Thanks! >
Re: hbase slack
Could you please give us an email address? bbeaudrea...@hubspot.com.invalid Is this the expected one? There is an 'invalid' in it... Bryan Beaudreault 于2021年5月18日周二 上午4:04写道: > Is there an existing user group slack of hbase users? If so can I have an > invite? > > Thanks! >
hbase slack
Is there an existing user group slack of hbase users? If so can I have an invite? Thanks!
HotSpot detection/mitigation worker?
Hey all, We run a bunch of big hbase clusters that get used by hundreds of product teams for a variety of real-time workloads. We are a B2B company, so most data has a customerId somewhere in the rowkey. As the team that owns the hbase infrastructure, we try to help product teams properly design schemas to avoid hotspotting, but inevitably it happens. It may not necessarily just be hotspotting, but for example request volume may not be evenly distributed across all regions of a table. This hotspotting/distribution issue makes it hard for the balancer to keep the cluster balanced from a load perspective -- sure, all RS have the same number of regions, but those regions are not all created equal from a load perspective. This results in cases where one RS might be consistently at 70% cpu, another might be at 30%, and all the rest are in a band in-between. We already have a normalizer job which works similarly to the SimpleRegionNormalizer -- keeping regions approximately the same size from a data size perspective. I'm considering updating our normalizer to also take into account region load. My general plan is to follow a similar strategy to the balancer -- keep a configurable number of RegionLoad objects in memory per-region, and extract averages for readRequestsCount from those. If a region's average load is > some threshold relative to other regions in the same table, split it. If it's < some threshold relative to other regions in the same table, merge it. I'm writing because I'm wondering if anyone else has had this problem and if there exists prior art here. Is there a reason HBase does not provide a configurable load-based normalizer (beyond typical OSS reasons -- no one contributed it yet)? Thanks!
Re: [ANNOUNCE] New HBase Committer Xiaolin Ha(哈晓琳)
Congratulations, Xiaolin, and thank you for all your contributions!! On Sat, May 15, 2021 at 7:11 AM 张铎(Duo Zhang) wrote: > On behalf of the Apache HBase PMC, I am pleased to announce that Xiaolin > Ha(sunhelly) has accepted the PMC's invitation to become a committer on the > project. We appreciate all of Xiaolin's generous contributions thus far and > look forward to her continued involvement. > > Congratulations and welcome, Xiaolin Ha! > > 我很高兴代表Apache HBase PMC宣布哈晓琳已接受我们的邀请,成为Apache > HBase项目的Committer。感谢哈晓琳一直以来为HBase项目做出的贡献,并期待她在未来继续承担更多的责任。 > > 欢迎哈晓琳! >