[ https://issues.apache.org/jira/browse/HBASE-21222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16625812#comment-16625812 ]
Duo Zhang commented on HBASE-21222: ----------------------------------- Got it. So we need a tool in HBCK2 to handle this case. > [amv2] Closing region on a non-existent server creates STUCK regions > -------------------------------------------------------------------- > > Key: HBASE-21222 > URL: https://issues.apache.org/jira/browse/HBASE-21222 > Project: HBase > Issue Type: Bug > Components: amv2 > Reporter: stack > Assignee: stack > Priority: Major > > Ran into this one where a Region had been on a server but after a bunch of > crashing and meddling in Master Proc WALs, any attempt at unassign has the > procedure fail (see below) and then report the region as STUCK. > I broke the lock w/ new hbck2 tooling and then tried to offline again but > same thing happened. Bug. Fix. > {code} > 2018-09-22 18:36:41,900 INFO > org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: Dispatch > pid=138650, ppid=121871, state=RUNNABLE:REGION_TRANSITION_DISPATCH, > locked=true; UnassignProcedure > table=IntegrationTestBigLinkedList_20180614072614, > region=51cdade76ca7217ec191f39e5f56c61c, > server=vd0637.halxg.cloudera.com,22101,1537397969558; rit=CLOSING, > location=vd0637.halxg.cloudera.com,22101,1537397969558 > 2018-09-22 18:36:41,899 INFO > org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler: > pid=138646, ppid=121871, state=RUNNABLE:REGION_TRANSITION_DISPATCH; > UnassignProcedure table=IntegrationTestBigLinkedList_20180614072614, > region=0780467efe4c5901887fb12bfa406fa7, > server=vc1228.halxg.cloudera.com,22101,1537578279837 checking lock on > 0780467efe4c5901887fb12bfa406fa7 > 2018-09-22 18:36:41,900 WARN > org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: Remote > call failed vd0637.halxg.cloudera.com,22101,1537397969558; pid=138650, > ppid=121871, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true; > UnassignProcedure table=IntegrationTestBigLinkedList_20180614072614, > region=51cdade76ca7217ec191f39e5f56c61c, > server=vd0637.halxg.cloudera.com,22101,1537397969558; rit=CLOSING, > location=vd0637.halxg.cloudera.com,22101,1537397969558; > exception=NoServerDispatchException > org.apache.hadoop.hbase.procedure2.NoServerDispatchException: > vd0637.halxg.cloudera.com,22101,1537397969558; pid=138650, ppid=121871, > state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true; UnassignProcedure > table=IntegrationTestBigLinkedList_20180614072614, > region=51cdade76ca7217ec191f39e5f56c61c, > server=vd0637.halxg.cloudera.com,22101,1537397969558 > at > org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher.addOperationToNode(RemoteProcedureDispatcher.java:177) > at > org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.addToRemoteDispatcher(RegionTransitionProcedure.java:277) > at > org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:202) > at > org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:370) > at > org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:97) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:924) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1684) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1471) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:77) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1983) > 2018-09-22 18:36:41,903 WARN > org.apache.hadoop.hbase.master.assignment.UnassignProcedure: Expiring > vd0637.halxg.cloudera.com,22101,1537397969558, pid=138650, ppid=121871, > state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true; UnassignProcedure > table=IntegrationTestBigLinkedList_20180614072614, > region=51cdade76ca7217ec191f39e5f56c61c, > server=vd0637.halxg.cloudera.com,22101,1537397969558 rit=CLOSING, > location=vd0637.halxg.cloudera.com,22101,1537397969558; > exception=NoServerDispatchException > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)