[ 
https://issues.apache.org/jira/browse/HBASE-20881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16577508#comment-16577508
 ] 

Duo Zhang commented on HBASE-20881:
-----------------------------------

Looped 100 times locally and it finally failed with

{noformat}
2018-08-12 19:57:18,174 ERROR [PEWorker-1] helpers.MarkerIgnoringBase(159): 
CODE-BUG: Uncaught runtime exception for pid=83, 
state=FAILED:SPLIT_TABLE_REGION_UPDATE_META, hasLock=true, 
exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
TransitRegionStateProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException:
 Max attempts 10 exceeded; SplitTableRegionProcedure 
table=testRecoveryAndDoubleExecution, parent=2b370ab236c7bd08956fc25f712f49e4, 
daughterA=de5ab31764b272230cb50ca31b8ecbdb, 
daughterB=0d190bf20801e4bb12d6aaf40e971340
java.lang.UnsupportedOperationException: pid=83, 
state=FAILED:SPLIT_TABLE_REGION_PRE_OPERATION_AFTER_META, hasLock=true, 
exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
TransitRegionStateProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException:
 Max attempts 10 exceeded; SplitTableRegionProcedure 
table=testRecoveryAndDoubleExecution, parent=2b370ab236c7bd08956fc25f712f49e4, 
daughterA=de5ab31764b272230cb50ca31b8ecbdb, 
daughterB=0d190bf20801e4bb12d6aaf40e971340 unhandled 
state=SPLIT_TABLE_REGION_PRE_OPERATION_AFTER_META
        at 
org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.rollbackState(SplitTableRegionProcedure.java:320)
        at 
org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.rollbackState(SplitTableRegionProcedure.java:1)
        at 
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:208)
        at 
org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:886)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1436)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1392)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1270)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$7(ProcedureExecutor.java:1251)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1822)
2018-08-12 19:57:18,174 WARN  [PEWorker-1] 
procedure2.ProcedureExecutor$Testing(98): Toggle KILL before store update to: 
true
2018-08-12 19:57:18,192 ERROR [PEWorker-1] helpers.MarkerIgnoringBase(159): 
CODE-BUG: Uncaught runtime exception for pid=83, 
state=FAILED:SPLIT_TABLE_REGION_PRE_OPERATION_BEFORE_META, hasLock=true, 
exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
TransitRegionStateProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException:
 Max attempts 10 exceeded; SplitTableRegionProcedure 
table=testRecoveryAndDoubleExecution, parent=2b370ab236c7bd08956fc25f712f49e4, 
daughterA=de5ab31764b272230cb50ca31b8ecbdb, 
daughterB=0d190bf20801e4bb12d6aaf40e971340
java.lang.UnsupportedOperationException: pid=83, 
state=FAILED:SPLIT_TABLE_REGION_UPDATE_META, hasLock=true, 
exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
TransitRegionStateProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException:
 Max attempts 10 exceeded; SplitTableRegionProcedure 
table=testRecoveryAndDoubleExecution, parent=2b370ab236c7bd08956fc25f712f49e4, 
daughterA=de5ab31764b272230cb50ca31b8ecbdb, 
daughterB=0d190bf20801e4bb12d6aaf40e971340 unhandled 
state=SPLIT_TABLE_REGION_UPDATE_META
        at 
org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.rollbackState(SplitTableRegionProcedure.java:320)
        at 
org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.rollbackState(SplitTableRegionProcedure.java:1)
        at 
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:208)
        at 
org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:886)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1436)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1392)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1270)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$7(ProcedureExecutor.java:1251)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1822)
{noformat}

Let me dig more.

> Introduce a region transition procedure to handle all the state transition 
> for a region
> ---------------------------------------------------------------------------------------
>
>                 Key: HBASE-20881
>                 URL: https://issues.apache.org/jira/browse/HBASE-20881
>             Project: HBase
>          Issue Type: Sub-task
>          Components: amv2, proc-v2
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>            Priority: Major
>             Fix For: 3.0.0, 2.2.0
>
>         Attachments: HBASE-20881-v1.patch, HBASE-20881-v2.patch, 
> HBASE-20881-v3.patch, HBASE-20881-v4.patch, HBASE-20881-v4.patch, 
> HBASE-20881-v5.patch, HBASE-20881-v6.patch, HBASE-20881-v7.patch, 
> HBASE-20881-v7.patch, HBASE-20881.patch
>
>
> Now have an AssignProcedure, an UnssignProcedure, and also a 
> MoveRegionProcedure which schedules an AssignProcedure and an 
> UnssignProcedure to move a region. This makes the logic a bit complicated, as 
> MRP is not a RIT, so when SCP can not interrupt it directly...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to