[ https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652954#comment-16652954 ]
stack commented on HBASE-20973: ------------------------------- Here is another example: {code} 2018-10-16 14:06:47,975 WARN org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Rollback because parent is done/rolledback proc=pid=1337789, ppid=1275219, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=IntegrationTestBigLinkedList_20180626064758, region=cda7c63e2cfee082e8d0d7ee5fc28a20 2018-10-16 14:06:47,976 WARN org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Worker terminating UNNATURALLY null java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.hbase.procedure2.store.BitSetNode.updateState(BitSetNode.java:396) at org.apache.hadoop.hbase.procedure2.store.BitSetNode.delete(BitSetNode.java:155) at org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:153) at org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:138) at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:782) at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:729) at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:616) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1684) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1475) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:79) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2059) {code} See it on startup. > ArrayIndexOutOfBoundsException when rolling back procedure > ---------------------------------------------------------- > > Key: HBASE-20973 > URL: https://issues.apache.org/jira/browse/HBASE-20973 > Project: HBase > Issue Type: Sub-task > Components: amv2 > Affects Versions: 2.1.0, 2.0.1 > Reporter: Allan Yang > Assignee: Allan Yang > Priority: Critical > > Find this one while investigating HBASE-20921. After the root > procedure(ModifyTableProcedure in this case) rolled back, a > ArrayIndexOutOfBoundsException was thrown > {code} > 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): > CODE-BUG: Uncaught runtime exception for pid=5973, > state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo > interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, > state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; > ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l > ang.NullPointerException; ModifyTableProcedure > table=IntegrationTestBigLinkedList > java.lang.UnsupportedOperationException: unhandled > state=MODIFY_TABLE_REOPEN_ALL_REGIONS > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147) > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > 2018-07-18 01:39:10,243 WARN [PEWorker-8] > procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) > {code} > This is a very serious condition, After this exception thrown, the exclusive > lock held by ModifyTableProcedure was never released. All the procedure > against this table were blocked. Until the master restarted, and since the > lock info for the procedure won't be restored, the other procedures can go > again, it is quite embarrassing that a bug save us...(this bug will be fixed > in HBASE-20846) > I tried to reproduce this one using the test case in HBASE-20921 but I just > can't reproduce it. > A easy way to resolve this is add a try catch, making sure no matter what > happens, the table's exclusive lock can always be relased. -- This message was sent by Atlassian JIRA (v7.6.3#76005)