Agree. It solves the problem in terms of randomizing distribution, but does
not increase cardinality.
On Tue, May 18, 2021 at 6:29 PM Bryan Beaudreault
wrote:
> Hi Mallikarjun, thanks for the response.
>
> I agree that it is hard to fully mitigate a bad rowkey design. We do make
> pretty heavy
Hi Mallikarjun, thanks for the response.
I agree that it is hard to fully mitigate a bad rowkey design. We do make
pretty heavy use of hash prefixes, and we don't really have many examples
of the common problem you describe where the "latest" data is in 1-2
regions. Our distribution issues
I think, no matter how good a balancer cost function be written, it cannot
cover for a not so optimal row key design. Say for example, you have 10
regionservers, 100 regions and your application is heavy on the latest data
which is mostly 1 or 2 regions, how many ever splits and/or merges it
Hey all,
We run a bunch of big hbase clusters that get used by hundreds of product
teams for a variety of real-time workloads. We are a B2B company, so most
data has a customerId somewhere in the rowkey. As the team that owns the
hbase infrastructure, we try to help product teams properly design