[ 
https://issues.apache.org/jira/browse/HBASE-20921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552867#comment-16552867
 ] 

Duo Zhang commented on HBASE-20921:
-----------------------------------

The problem of your patch is that, we do not need to check all the regions of 
the table, we only need to check the ones we recorded before. In your case, the 
problem is that the region has been removed due to merge or split, so the 
RegionStateNode will null and cause NPE. We could just add a null check there, 
if RegionStateNode is null then we can make sure that we do not need to deal 
with it any more(just as you said in the description).

Thanks.

> Possible NPE in ReopenTableRegionsProcedure
> -------------------------------------------
>
>                 Key: HBASE-20921
>                 URL: https://issues.apache.org/jira/browse/HBASE-20921
>             Project: HBase
>          Issue Type: Sub-task
>          Components: amv2
>    Affects Versions: 3.0.0, 2.1.0, 2.0.2
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>            Priority: Major
>         Attachments: HBASE-20921.branch-2.0.001.patch
>
>
> After HBASE-20752, we issue a ReopenTableRegionsProcedure in 
> ModifyTableProcedure to ensure all regions are reopened.
> But, ModifyTableProcedure and ReopenTableRegionsProcedure do not hold the 
> lock (why?), so there is a chance that while ModifyTableProcedure  executing, 
> a merge/split procedure can be executed at the same time.
> So, when ReopenTableRegionsProcedure reaches the state of 
> "REOPEN_TABLE_REGIONS_CONFIRM_REOPENED", some of the persisted regions to 
> check is actually not exists, thus a NPE will throw.
> {code}
> 2018-07-18 01:38:57,528 INFO  [PEWorker-9] 
> procedure2.ProcedureExecutor(1246): Finished pid=6110, state=SUCCESS; 
> MergeTableRegionsProcedure table=IntegrationTestBigLinkedList, 
> regions=[845d286231eb01b7
> 1aeaa17b0e30058d, 4a46ab0918c99cada72d5336ad83a828], forcibly=false in 
> 10.8610sec
> 2018-07-18 01:38:57,530 ERROR [PEWorker-8] 
> procedure2.ProcedureExecutor(1478): CODE-BUG: Uncaught runtime exception: 
> pid=5974, ppid=5973, state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; 
> ReopenTab
> leRegionsProcedure table=IntegrationTestBigLinkedList
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.hbase.master.assignment.RegionStates.checkReopened(RegionStates.java:651)
>         at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
>         at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>         at 
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>         at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>         at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>         at 
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>         at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>         at 
> org.apache.hadoop.hbase.master.procedure.ReopenTableRegionsProcedure.executeFromState(ReopenTableRegionsProcedure.java:102)
>         at 
> org.apache.hadoop.hbase.master.procedure.ReopenTableRegionsProcedure.executeFromState(ReopenTableRegionsProcedure.java:45)
>         at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:184)
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:850)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1453)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1221)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> {code}
> I think we need to renew the region list of the table at the 
> "REOPEN_TABLE_REGIONS_CONFIRM_REOPENED" state. For the regions which are 
> merged or split, we do not need to check it. Since we can be sure that they 
> are opened after we made change to table descriptor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to