[ https://issues.apache.org/jira/browse/HBASE-20881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16577508#comment-16577508 ]
Duo Zhang commented on HBASE-20881: ----------------------------------- Looped 100 times locally and it finally failed with {noformat} 2018-08-12 19:57:18,174 ERROR [PEWorker-1] helpers.MarkerIgnoringBase(159): CODE-BUG: Uncaught runtime exception for pid=83, state=FAILED:SPLIT_TABLE_REGION_UPDATE_META, hasLock=true, exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via TransitRegionStateProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max attempts 10 exceeded; SplitTableRegionProcedure table=testRecoveryAndDoubleExecution, parent=2b370ab236c7bd08956fc25f712f49e4, daughterA=de5ab31764b272230cb50ca31b8ecbdb, daughterB=0d190bf20801e4bb12d6aaf40e971340 java.lang.UnsupportedOperationException: pid=83, state=FAILED:SPLIT_TABLE_REGION_PRE_OPERATION_AFTER_META, hasLock=true, exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via TransitRegionStateProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max attempts 10 exceeded; SplitTableRegionProcedure table=testRecoveryAndDoubleExecution, parent=2b370ab236c7bd08956fc25f712f49e4, daughterA=de5ab31764b272230cb50ca31b8ecbdb, daughterB=0d190bf20801e4bb12d6aaf40e971340 unhandled state=SPLIT_TABLE_REGION_PRE_OPERATION_AFTER_META at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.rollbackState(SplitTableRegionProcedure.java:320) at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.rollbackState(SplitTableRegionProcedure.java:1) at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:208) at org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:886) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1436) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1392) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1270) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$7(ProcedureExecutor.java:1251) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1822) 2018-08-12 19:57:18,174 WARN [PEWorker-1] procedure2.ProcedureExecutor$Testing(98): Toggle KILL before store update to: true 2018-08-12 19:57:18,192 ERROR [PEWorker-1] helpers.MarkerIgnoringBase(159): CODE-BUG: Uncaught runtime exception for pid=83, state=FAILED:SPLIT_TABLE_REGION_PRE_OPERATION_BEFORE_META, hasLock=true, exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via TransitRegionStateProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max attempts 10 exceeded; SplitTableRegionProcedure table=testRecoveryAndDoubleExecution, parent=2b370ab236c7bd08956fc25f712f49e4, daughterA=de5ab31764b272230cb50ca31b8ecbdb, daughterB=0d190bf20801e4bb12d6aaf40e971340 java.lang.UnsupportedOperationException: pid=83, state=FAILED:SPLIT_TABLE_REGION_UPDATE_META, hasLock=true, exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via TransitRegionStateProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max attempts 10 exceeded; SplitTableRegionProcedure table=testRecoveryAndDoubleExecution, parent=2b370ab236c7bd08956fc25f712f49e4, daughterA=de5ab31764b272230cb50ca31b8ecbdb, daughterB=0d190bf20801e4bb12d6aaf40e971340 unhandled state=SPLIT_TABLE_REGION_UPDATE_META at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.rollbackState(SplitTableRegionProcedure.java:320) at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.rollbackState(SplitTableRegionProcedure.java:1) at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:208) at org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:886) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1436) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1392) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1270) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$7(ProcedureExecutor.java:1251) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1822) {noformat} Let me dig more. > Introduce a region transition procedure to handle all the state transition > for a region > --------------------------------------------------------------------------------------- > > Key: HBASE-20881 > URL: https://issues.apache.org/jira/browse/HBASE-20881 > Project: HBase > Issue Type: Sub-task > Components: amv2, proc-v2 > Reporter: Duo Zhang > Assignee: Duo Zhang > Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: HBASE-20881-v1.patch, HBASE-20881-v2.patch, > HBASE-20881-v3.patch, HBASE-20881-v4.patch, HBASE-20881-v4.patch, > HBASE-20881-v5.patch, HBASE-20881-v6.patch, HBASE-20881-v7.patch, > HBASE-20881-v7.patch, HBASE-20881.patch > > > Now have an AssignProcedure, an UnssignProcedure, and also a > MoveRegionProcedure which schedules an AssignProcedure and an > UnssignProcedure to move a region. This makes the logic a bit complicated, as > MRP is not a RIT, so when SCP can not interrupt it directly... -- This message was sent by Atlassian JIRA (v7.6.3#76005)