This is the first time I've heard of a region split taking 4 minutes. For us, it's always on the order of seconds. That's true even for a large 50+gb region. It might be worth looking into why that's so slow for you.
On Wed, Feb 7, 2024 at 12:50 PM Rushabh Shah <rushabh.s...@salesforce.com.invalid> wrote: > Thank you Andrew, Bryan and Duo for your responses. > > > My main thought is that a migration like this should use bulk loading, > > But also, I think, that data transfer should be in bulk > > We are working on moving to bulk loading. > > > With Admin.splitRegion, you can specify a split point. You can use that > to > iteratively add a bunch of regions wherever you need them in the keyspace. > Yes, it's 2 at a time, but it should still be quick enough in the grand > scheme of a large migration. > > > Trying to do some back of the envelope calculations. > In a production environment, it took around 4 minutes to split a recently > split region which had 4 store files with a total of 5 GB of data. > Assuming we are migrating 5000 tenants at a time and normally we have > around 10% of the tenants (500 tenants) which have data > spread across more than 1000 regions. We have around 10 huge tables where > we store the tenant's data for different use cases. > All the above numbers are on the *conservative* side. > > To create a split structure for 1000 regions, we need 10 iterations of the > splits (2^10 = 1024). This assumes we are parallely splitting the regions. > Each split takes around 4 minutes. So to create 1000 regions just for 1 > tenant and for 1 table, it takes around 40 minutes. > For 10 tables for 1 tenant, it takes around 400 minutes. > > For 500 tenants, this will take around *140 days*. To reduce this time > further, we can also create a split structure for each tenant and each > table in parallel. > But this would put a lot of pressure on the cluster and also it will > require a lot of operational overhead and still we will end up with > the whole process taking days, if not months. > > Since we are moving our infrastructure to Public Cloud, we anticipate this > huge migration happening once every month. > > > > Adding a splitRegion method that takes byte[][] for multiple split points > would be a nice UX improvement, but not > strictly necessary. > > IMHO for all the reasons stated above, I believe this is necessary. > > > > > > On Mon, Jan 29, 2024 at 6:25 AM 张铎(Duo Zhang) <palomino...@gmail.com> > wrote: > > > As it is called 'pre' split, it means that it can only happen when > > there is no data in table. > > > > If there are already data in the table, you can not always create > > 'empty' regions, as you do not know whether there are already data in > > the given range... > > > > And technically, if you want to split a HFile into more than 2 parts, > > you need to design new algorithm as now in HBase we only support top > > reference and bottom reference... > > > > Thanks. > > > > Bryan Beaudreault <bbeaudrea...@apache.org> 于2024年1月27日周六 02:16写道: > > > > > > My main thought is that a migration like this should use bulk loading, > > > which should be relatively easy given you already use MR > > > (HFileOutputFormat2). It doesn't solve the region-splitting problem. > With > > > Admin.splitRegion, you can specify a split point. You can use that to > > > iteratively add a bunch of regions wherever you need them in the > > keyspace. > > > Yes, it's 2 at a time, but it should still be quick enough in the grand > > > scheme of a large migration. Adding a splitRegion method that takes > > > byte[][] for multiple split points would be a nice UX improvement, but > > not > > > strictly necessary. > > > > > > On Fri, Jan 26, 2024 at 12:10 PM Rushabh Shah > > > <rushabh.s...@salesforce.com.invalid> wrote: > > > > > > > Hi Everyone, > > > > At my workplace, we use HBase + Phoenix to run our customer > workloads. > > Most > > > > of our phoenix tables are multi-tenant and we store the tenantID as > the > > > > leading part of the rowkey. Each tenant belongs to only 1 hbase > > cluster. > > > > Due to capacity planning, hardware refresh cycles and most recently > > move to > > > > public cloud initiatives, we have to migrate a tenant from one hbase > > > > cluster (source cluster) to another hbase cluster (target cluster). > > > > Normally we migrate a lot of tenants (in 10s of thousands) at a time > > and > > > > hence we have to copy a huge amount of data (in TBs) from multiple > > source > > > > clusters to a single target cluster. We have our internal tool which > > uses > > > > MapReduce framework to copy the data. Since all of these tenants > don’t > > have > > > > any presence on the target cluster (Note that the table is NOT empty > > since > > > > we have data for other tenants in the target cluster), they start > with > > one > > > > region and due to an organic split process, the data gets distributed > > among > > > > different regions and different regionservers. But the organic > > splitting > > > > process takes a lot of time and due to the distributed nature of the > MR > > > > framework, it causes hotspotting issues on the target cluster which > > often > > > > lasts for days. This causes availability issues where the CPU is > > saturated > > > > and/or disk saturation on the regionservers ingesting the data. Also > > this > > > > causes a lot of replication related alerts (Age of last ship, > LogQueue > > > > size) which goes on for days. > > > > > > > > In order to handle the huge influx of data, we should ideally > > pre-split the > > > > table on the target based on the split structure present on the > source > > > > cluster. If we pre-split and create empty regions with right region > > > > boundaries it will help to distribute the load to different regions > and > > > > region servers and will prevent hotspotting. > > > > > > > > Problems with the above approach: > > > > 1. Currently we allow pre splitting only while creating a new table. > > But in > > > > our production env, we already have the table created for other > > tenants. So > > > > we would like to pre-split an existing table for new tenants. > > > > 2. Currently we split a given region into just 2 daughter regions. > But > > if > > > > we have the split points information from the source cluster and if > the > > > > data for the to-be-migrated tenant is split across 100 regions on the > > > > source side, we would ideally like to create 100 empty regions on the > > > > target cluster. > > > > > > > > Trying to get early feedback from the community. Do you all think > this > > is a > > > > good idea? Open to other suggestions also. > > > > > > > > > > > > Thank you, > > > > Rushabh. > > > > > > >