[ 
https://issues.apache.org/jira/browse/HBASE-28533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Roudnitsky updated HBASE-28533:
--------------------------------------
    Description: 
SplitTableRegionProcedure 

When a SplitTableRegionProcedure is run for a region whose namespace is at its 
maximum region quota limit, the split procedure will fail and rollback, and 
Hmaster's in memory RegionStateNode for the region is left in a SPLITTING 
state. Hmaster will then refuse to start any subsequent merge/split/move 
procedures for that region because it believes the region is not OPEN, until it 
is restarted and the in memory record of region states is reset.

In the first step of the split procedure SPLIT_TABLE_REGION_PREPARE the parent 
region's RegionStateNode state is set to SPLITTING, and the transition is not 
written to the meta table. In the next step SPLIT_TABLE_REGION_PRE_OPERATION 
the region quota check is done, QuotaExceededException is thrown and the 
procedure ends in ROLLEDBACK state without reverting the RegionStateNode back 
to OPEN state. Hmaster is left believing the region is in a SPLITTING state 
according to its in memory RegionStates, while the region is still online on 
the assigned region server and according to meta.

To reproduce in HBase shell:
{code:java}
> create_namespace 'test_ns', {'hbase.namespace.quota.maxregions'=> 2}
> create 'test_ns:test_table', 'f1', {NUMREGIONS => 2, SPLITALGO => 
> 'UniformSplit'}
> region_a = <first region from list_regions 'test_ns:test_table'>
> region_b = <second region from list_regions 'test_ns:test_table'>

> split region_a, 'x'
# HMaster will report: 
pid=405, state=ROLLEDBACK, 
exception=org.apache.hadoop.hbase.quotas.QuotaExceededException via 
master-split-regions:org.apache.hadoop.hbase.quotas.QuotaExceededException: 
Region split not possible for :<region_a> as quota limits are exceeded ; 
SplitTableRegionProcedure table=test_ns:test_table, parent=...

> merge_region region_a, region_b
ERROR: org.apache.hadoop.hbase.exceptions.MergeRegionException: 
org.apache.hadoop.hbase.client.DoNotRetryRegionException: <region_a> is not 
OPEN; state=SPLITTING

> stop_master # trigger hmaster failover 
> merge_region region_a, region_b # merge now succeeds {code}

  was:
When a SplitTableRegionProcedure is run for a region whose namespace is at its 
maximum region quota limit, the split procedure will fail and rollback, and 
Hmaster's in memory RegionStateNode for the region is left in a SPLITTING 
state. Hmaster will then refuse to start any subsequent merge/split/move 
procedures for that region because it believes the region is not OPEN, until it 
is restarted and the in memory record of region states is reset.

In the first step of the split procedure SPLIT_TABLE_REGION_PREPARE the parent 
region's RegionStateNode state is set to SPLITTING, and the transition is not 
written to the meta table. In the next step SPLIT_TABLE_REGION_PRE_OPERATION 
the region quota check is done, QuotaExceededException is thrown and the 
procedure ends in ROLLEDBACK state without reverting the RegionStateNode back 
to OPEN state. Hmaster is left believing the region is in a SPLITTING state 
according to its in memory RegionStates, while the region is still online on 
the assigned region server and according to meta.

To reproduce in HBase shell:

{code:java}
> create_namespace 'test_ns', {'hbase.namespace.quota.maxregions'=> 2}
> create 'test_ns:test_table', 'f1', {NUMREGIONS => 2, SPLITALGO => 
> 'UniformSplit'}
> region_a = <first region from list_regions 'test_ns:test_table'>
> region_b = <second region from list_regions 'test_ns:test_table'>

> split region_a, 'x'
# HMaster will report: 
pid=405, state=ROLLEDBACK, 
exception=org.apache.hadoop.hbase.quotas.QuotaExceededException via 
master-split-regions:org.apache.hadoop.hbase.quotas.QuotaExceededException: 
Region split not possible for :<region_a> as quota limits are exceeded ; 
SplitTableRegionProcedure table=test_ns:test_table, parent=...

> merge_region region_a, region_b
ERROR: org.apache.hadoop.hbase.exceptions.MergeRegionException: 
org.apache.hadoop.hbase.client.DoNotRetryRegionException: <region_a> is not 
OPEN; state=SPLITTING

> stop_master # trigger hmaster failover 
> merge_region region_a, region_b # merge now succeeds {code}


> Split procedure rollback can leave parent region state in SPLITTING after 
> completion
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-28533
>                 URL: https://issues.apache.org/jira/browse/HBASE-28533
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>         Environment: Tested on HBase Version 2.5.8 and latest master branch 
>            Reporter: Daniel Roudnitsky
>            Assignee: Daniel Roudnitsky
>            Priority: Major
>
> SplitTableRegionProcedure 
> When a SplitTableRegionProcedure is run for a region whose namespace is at 
> its maximum region quota limit, the split procedure will fail and rollback, 
> and Hmaster's in memory RegionStateNode for the region is left in a SPLITTING 
> state. Hmaster will then refuse to start any subsequent merge/split/move 
> procedures for that region because it believes the region is not OPEN, until 
> it is restarted and the in memory record of region states is reset.
> In the first step of the split procedure SPLIT_TABLE_REGION_PREPARE the 
> parent region's RegionStateNode state is set to SPLITTING, and the transition 
> is not written to the meta table. In the next step 
> SPLIT_TABLE_REGION_PRE_OPERATION the region quota check is done, 
> QuotaExceededException is thrown and the procedure ends in ROLLEDBACK state 
> without reverting the RegionStateNode back to OPEN state. Hmaster is left 
> believing the region is in a SPLITTING state according to its in memory 
> RegionStates, while the region is still online on the assigned region server 
> and according to meta.
> To reproduce in HBase shell:
> {code:java}
> > create_namespace 'test_ns', {'hbase.namespace.quota.maxregions'=> 2}
> > create 'test_ns:test_table', 'f1', {NUMREGIONS => 2, SPLITALGO => 
> > 'UniformSplit'}
> > region_a = <first region from list_regions 'test_ns:test_table'>
> > region_b = <second region from list_regions 'test_ns:test_table'>
> > split region_a, 'x'
> # HMaster will report: 
> pid=405, state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.quotas.QuotaExceededException via 
> master-split-regions:org.apache.hadoop.hbase.quotas.QuotaExceededException: 
> Region split not possible for :<region_a> as quota limits are exceeded ; 
> SplitTableRegionProcedure table=test_ns:test_table, parent=...
> > merge_region region_a, region_b
> ERROR: org.apache.hadoop.hbase.exceptions.MergeRegionException: 
> org.apache.hadoop.hbase.client.DoNotRetryRegionException: <region_a> is not 
> OPEN; state=SPLITTING
> > stop_master # trigger hmaster failover 
> > merge_region region_a, region_b # merge now succeeds {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to