[ https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16665164#comment-16665164 ]
Duo Zhang commented on HBASE-20973: ----------------------------------- Ping [~stack] and [~allan163]. I will prepare a patch soon, anyway, let's revert the previous patch first as it does not help. > ArrayIndexOutOfBoundsException when rolling back procedure > ---------------------------------------------------------- > > Key: HBASE-20973 > URL: https://issues.apache.org/jira/browse/HBASE-20973 > Project: HBase > Issue Type: Sub-task > Components: amv2 > Affects Versions: 2.1.0, 2.0.1 > Reporter: Allan Yang > Assignee: Allan Yang > Priority: Critical > Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-20973-UT.patch, HBASE-20973.branch-2.0.001.patch, > HBASE-20973.branch-2.0.002.patch > > > Find this one while investigating HBASE-20921. After the root > procedure(ModifyTableProcedure in this case) rolled back, a > ArrayIndexOutOfBoundsException was thrown > {code} > 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): > CODE-BUG: Uncaught runtime exception for pid=5973, > state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo > interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, > state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; > ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l > ang.NullPointerException; ModifyTableProcedure > table=IntegrationTestBigLinkedList > java.lang.UnsupportedOperationException: unhandled > state=MODIFY_TABLE_REOPEN_ALL_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147) > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > 2018-07-18 01:39:10,243 WARN [PEWorker-8] > procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > {code} > This is a very serious condition, After this exception thrown, the exclusive > lock held by ModifyTableProcedure was never released. All the procedure > against this table were blocked. Until the master restarted, and since the > lock info for the procedure won't be restored, the other procedures can go > again, it is quite embarrassing that a bug save us...(this bug will be fixed > in HBASE-20846) > I tried to reproduce this one using the test case in HBASE-20921 but I just > can't reproduce it. > A easy way to resolve this is add a try catch, making sure no matter what > happens, the table's exclusive lock can always be relased. -- This message was sent by Atlassian JIRA (v7.6.3#76005)