Re: [DISCUSS] Pre splitting an existing table to avoid hotspotting issues.

Bryan Beaudreault Wed, 07 Feb 2024 10:03:04 -0800

This is the first time I've heard of a region split taking 4 minutes. For
us, it's always on the order of seconds. That's true even for a large 50+gb
region. It might be worth looking into why that's so slow for you.


On Wed, Feb 7, 2024 at 12:50 PM Rushabh Shah
<rushabh.s...@salesforce.com.invalid> wrote:

> Thank you Andrew, Bryan and Duo for your responses.
>
> > My main thought is that a migration like this should use bulk loading,
> > But also, I think, that data transfer should be in bulk
>
> We are working on moving to bulk loading.
>
> > With Admin.splitRegion, you can specify a split point. You can use that
> to
> iteratively add a bunch of regions wherever you need them in the keyspace.
> Yes, it's 2 at a time, but it should still be quick enough in the grand
> scheme of a large migration.
>
>
> Trying to do some back of the envelope calculations.
> In a production environment, it took around 4 minutes to split a recently
> split region which had 4 store files with a total of 5 GB of data.
> Assuming we are migrating 5000 tenants at a time and normally we have
> around 10% of the tenants (500 tenants) which have data
>  spread across more than 1000 regions. We have around 10 huge tables where
> we store the tenant's data for different use cases.
> All the above numbers are on the *conservative* side.
>
> To create a split structure for 1000 regions, we need 10 iterations of the
> splits (2^10 = 1024). This assumes we are parallely splitting the regions.
> Each split takes around 4 minutes. So to create 1000 regions just for 1
> tenant and for 1 table, it takes around 40 minutes.
> For 10 tables for 1 tenant, it takes around 400 minutes.
>
> For 500 tenants, this will take around *140 days*. To reduce this time
> further, we can also create a split structure for each tenant and each
> table in parallel.
> But this would put a lot of pressure on the cluster and also it will
> require a lot of operational overhead and still we will end up with
>  the whole process taking days, if not months.
>
> Since we are moving our infrastructure to Public Cloud, we anticipate this
> huge migration happening once every month.
>
>
> > Adding a splitRegion method that takes byte[][] for multiple split points
> would be a nice UX improvement, but not
> strictly necessary.
>
> IMHO for all the reasons stated above, I believe this is necessary.
>
>
>
>
>
> On Mon, Jan 29, 2024 at 6:25 AM 张铎(Duo Zhang) <palomino...@gmail.com>
> wrote:
>
> > As it is called 'pre' split, it means that it can only happen when
> > there is no data in table.
> >
> > If there are already data in the table, you can not always create
> > 'empty' regions, as you do not know whether there are already data in
> > the given range...
> >
> > And technically, if you want to split a HFile into more than 2 parts,
> > you need to design new algorithm as now in HBase we only support top
> > reference and bottom reference...
> >
> > Thanks.
> >
> > Bryan Beaudreault <bbeaudrea...@apache.org> 于2024年1月27日周六 02:16写道：
> > >
> > > My main thought is that a migration like this should use bulk loading,
> > > which should be relatively easy given you already use MR
> > > (HFileOutputFormat2). It doesn't solve the region-splitting problem.
> With
> > > Admin.splitRegion, you can specify a split point. You can use that to
> > > iteratively add a bunch of regions wherever you need them in the
> > keyspace.
> > > Yes, it's 2 at a time, but it should still be quick enough in the grand
> > > scheme of a large migration. Adding a splitRegion method that takes
> > > byte[][] for multiple split points would be a nice UX improvement, but
> > not
> > > strictly necessary.
> > >
> > > On Fri, Jan 26, 2024 at 12:10 PM Rushabh Shah
> > > <rushabh.s...@salesforce.com.invalid> wrote:
> > >
> > > > Hi Everyone,
> > > > At my workplace, we use HBase + Phoenix to run our customer
> workloads.
> > Most
> > > > of our phoenix tables are multi-tenant and we store the tenantID as
> the
> > > > leading part of the rowkey. Each tenant belongs to only 1 hbase
> > cluster.
> > > > Due to capacity planning, hardware refresh cycles and most recently
> > move to
> > > > public cloud initiatives, we have to migrate a tenant from one hbase
> > > > cluster (source cluster) to another hbase cluster (target cluster).
> > > > Normally we migrate a lot of tenants (in 10s of thousands) at a time
> > and
> > > > hence we have to copy a huge amount of data (in TBs) from multiple
> > source
> > > > clusters to a single target cluster. We have our internal tool which
> > uses
> > > > MapReduce framework to copy the data. Since all of these tenants
> don’t
> > have
> > > > any presence on the target cluster (Note that the table is NOT empty
> > since
> > > > we have data for other tenants in the target cluster), they start
> with
> > one
> > > > region and due to an organic split process, the data gets distributed
> > among
> > > > different regions and different regionservers. But the organic
> > splitting
> > > > process takes a lot of time and due to the distributed nature of the
> MR
> > > > framework, it causes hotspotting issues on the target cluster which
> > often
> > > > lasts for days. This causes availability issues where the CPU is
> > saturated
> > > > and/or disk saturation on the regionservers ingesting the data. Also
> > this
> > > > causes a lot of replication related alerts (Age of last ship,
> LogQueue
> > > > size) which goes on for days.
> > > >
> > > > In order to handle the huge influx of data, we should ideally
> > pre-split the
> > > > table on the target based on the split structure present on the
> source
> > > > cluster. If we pre-split and create empty regions with right region
> > > > boundaries it will help to distribute the load to different regions
> and
> > > > region servers and will prevent hotspotting.
> > > >
> > > > Problems with the above approach:
> > > > 1. Currently we allow pre splitting only while creating a new table.
> > But in
> > > > our production env, we already have the table created for other
> > tenants. So
> > > > we would like to pre-split an existing table for new tenants.
> > > > 2. Currently we split a given region into just 2 daughter regions.
> But
> > if
> > > > we have the split points information from the source cluster and if
> the
> > > > data for the to-be-migrated tenant is split across 100 regions on the
> > > > source side, we would ideally like to create 100 empty regions on the
> > > > target cluster.
> > > >
> > > > Trying to get early feedback from the community. Do you all think
> this
> > is a
> > > > good idea? Open to other suggestions also.
> > > >
> > > >
> > > > Thank you,
> > > > Rushabh.
> > > >
> >
>

Re: [DISCUSS] Pre splitting an existing table to avoid hotspotting issues.

Reply via email to