My main thought is that a migration like this should use bulk loading, which should be relatively easy given you already use MR (HFileOutputFormat2). It doesn't solve the region-splitting problem. With Admin.splitRegion, you can specify a split point. You can use that to iteratively add a bunch of regions wherever you need them in the keyspace. Yes, it's 2 at a time, but it should still be quick enough in the grand scheme of a large migration. Adding a splitRegion method that takes byte[][] for multiple split points would be a nice UX improvement, but not strictly necessary.
On Fri, Jan 26, 2024 at 12:10 PM Rushabh Shah <rushabh.s...@salesforce.com.invalid> wrote: > Hi Everyone, > At my workplace, we use HBase + Phoenix to run our customer workloads. Most > of our phoenix tables are multi-tenant and we store the tenantID as the > leading part of the rowkey. Each tenant belongs to only 1 hbase cluster. > Due to capacity planning, hardware refresh cycles and most recently move to > public cloud initiatives, we have to migrate a tenant from one hbase > cluster (source cluster) to another hbase cluster (target cluster). > Normally we migrate a lot of tenants (in 10s of thousands) at a time and > hence we have to copy a huge amount of data (in TBs) from multiple source > clusters to a single target cluster. We have our internal tool which uses > MapReduce framework to copy the data. Since all of these tenants don’t have > any presence on the target cluster (Note that the table is NOT empty since > we have data for other tenants in the target cluster), they start with one > region and due to an organic split process, the data gets distributed among > different regions and different regionservers. But the organic splitting > process takes a lot of time and due to the distributed nature of the MR > framework, it causes hotspotting issues on the target cluster which often > lasts for days. This causes availability issues where the CPU is saturated > and/or disk saturation on the regionservers ingesting the data. Also this > causes a lot of replication related alerts (Age of last ship, LogQueue > size) which goes on for days. > > In order to handle the huge influx of data, we should ideally pre-split the > table on the target based on the split structure present on the source > cluster. If we pre-split and create empty regions with right region > boundaries it will help to distribute the load to different regions and > region servers and will prevent hotspotting. > > Problems with the above approach: > 1. Currently we allow pre splitting only while creating a new table. But in > our production env, we already have the table created for other tenants. So > we would like to pre-split an existing table for new tenants. > 2. Currently we split a given region into just 2 daughter regions. But if > we have the split points information from the source cluster and if the > data for the to-be-migrated tenant is split across 100 regions on the > source side, we would ideally like to create 100 empty regions on the > target cluster. > > Trying to get early feedback from the community. Do you all think this is a > good idea? Open to other suggestions also. > > > Thank you, > Rushabh. >