[jira] [Commented] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure

Duo Zhang (JIRA) Sun, 21 Oct 2018 18:50:17 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658472#comment-16658472
 ]


Duo Zhang commented on HBASE-20973:
-----------------------------------

So the max node size is useless? Or there are holes where we miss the max size 
check?

And the memory waste is huge if we do not grow the BitSetNode, I'd say. 
Although it seems only a few bytes, but the BitSetNode itself also just consume 
a few bytes, which means that for the worst case the memory could be doubled if 
we can not grow the BitSetNode.

Anyway, correctness is the first thing. I've already filed HBASE-21314 for the 
efficient problem.

Thanks.

> ArrayIndexOutOfBoundsException when rolling back procedure
> ----------------------------------------------------------
>
>                 Key: HBASE-20973
>                 URL: https://issues.apache.org/jira/browse/HBASE-20973
>             Project: HBase
>          Issue Type: Sub-task
>          Components: amv2
>    Affects Versions: 2.1.0, 2.0.1
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>            Priority: Critical
>         Attachments: HBASE-20973.branch-2.0.001.patch
>
>
> Find this one while investigating HBASE-20921. After the root 
> procedure(ModifyTableProcedure  in this case) rolled back, a 
> ArrayIndexOutOfBoundsException was thrown
> {code}
> 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): 
> CODE-BUG: Uncaught runtime exception for pid=5973, 
> state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo
> interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, 
> state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; 
> ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l
> ang.NullPointerException; ModifyTableProcedure 
> table=IntegrationTestBigLinkedList
> java.lang.UnsupportedOperationException: unhandled 
> state=MODIFY_TABLE_REOPEN_ALL_REGIONS
>         at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147)
>         at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50)
>         at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> 2018-07-18 01:39:10,243 WARN  [PEWorker-8] 
> procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null
> java.lang.ArrayIndexOutOfBoundsException: 1
>         at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405)
>         at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178)
>         at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513)
>         at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505)
>         at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741)
>         at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691)
>         at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> {code}
> This is a very serious condition, After this exception thrown, the exclusive 
> lock held by ModifyTableProcedure was never released. All the procedure 
> against this table were blocked. Until the master restarted, and since the 
> lock info for the procedure won't be restored, the other procedures can go 
> again, it is quite embarrassing that a bug save us...(this bug will be fixed 
> in HBASE-20846)
> I tried to reproduce this one using the test case in HBASE-20921 but I just 
> can't reproduce it.
> A easy way to resolve this is add a try catch, making sure no matter what 
> happens, the table's exclusive lock can always be relased.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure

Reply via email to