[ https://issues.apache.org/jira/browse/HBASE-28533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wellington Chevreuil reassigned HBASE-28533: -------------------------------------------- Assignee: Daniel Roudnitsky > Region split failure due to region quota limit leaves Hmaster's in memory > state for the region in SPLITTING after procedure rollback > ------------------------------------------------------------------------------------------------------------------------------------ > > Key: HBASE-28533 > URL: https://issues.apache.org/jira/browse/HBASE-28533 > Project: HBase > Issue Type: Bug > Components: Region Assignment > Environment: Tested on HBase Version 2.5.8 and latest master branch > Reporter: Daniel Roudnitsky > Assignee: Daniel Roudnitsky > Priority: Major > > When a SplitTableRegionProcedure is run for a region whose namespace is at > its maximum region quota limit, the split procedure will fail and rollback, > and Hmaster's in memory RegionStateNode for the region is left in a SPLITTING > state. Hmaster will then refuse to start any subsequent merge/split/move > procedures for that region because it believes the region is not OPEN, until > it is restarted and the in memory record of region states is reset. > In the first step of the split procedure SPLIT_TABLE_REGION_PREPARE the > parent region's RegionStateNode state is set to SPLITTING, and the transition > is not written to the meta table. In the next step > SPLIT_TABLE_REGION_PRE_OPERATION the region quota check is done, > QuotaExceededException is thrown and the procedure ends in ROLLEDBACK state > without reverting the RegionStateNode back to OPEN state. Hmaster is left > believing the region is in a SPLITTING state according to its in memory > RegionStates, while the region is still online on the assigned region server > and according to meta. > To reproduce in HBase shell: > {code:java} > > create_namespace 'test_ns', {'hbase.namespace.quota.maxregions'=> 2} > > create 'test_ns:test_table', 'f1', {NUMREGIONS => 2, SPLITALGO => > > 'UniformSplit'} > > region_a = <first region from list_regions 'test_ns:test_table'> > > region_b = <second region from list_regions 'test_ns:test_table'> > > split region_a, 'x' > # HMaster will report: > pid=405, state=ROLLEDBACK, > exception=org.apache.hadoop.hbase.quotas.QuotaExceededException via > master-split-regions:org.apache.hadoop.hbase.quotas.QuotaExceededException: > Region split not possible for :<region_a> as quota limits are exceeded ; > SplitTableRegionProcedure table=test_ns:test_table, parent=... > > merge_region region_a, region_b > ERROR: org.apache.hadoop.hbase.exceptions.MergeRegionException: > org.apache.hadoop.hbase.client.DoNotRetryRegionException: <region_a> is not > OPEN; state=SPLITTING > > stop_master # trigger hmaster failover > > merge_region region_a, region_b # merge now succeeds {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)