[ https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16575920#comment-16575920 ]
Allan Yang commented on HBASE-20976: ------------------------------------ [~stack],[~Apache9]. Sorry that I have to reopen this again, since I find another case that SCP can be scheduled multiple times…… As you can see from the issue's description. 1. the RS is expired, and a SCP was submitted {code} 2018-08-09 12:29:55,665 WARN [PEWorker-11] master.ServerManager: Expiration of izbp1azj9xjvk1h9vioyvfz,16020,1533725024975 but server not online 2018-08-09 12:29:55,665 INFO [PEWorker-11] master.ServerManager: Processing expiration of izbp1azj9xjvk1h9vioyvfz,16020,1533725024975 on izbp1azj9xjvk1h9vioyvfz,16000,1533787159573 2018-08-09 12:29:55,815 DEBUG [PEWorker-11] assignment.AssignmentManager: Added=izbp1azj9xjvk1h9vioyvfz,16020,1533725024975 to dead servers, submitted shutdown handler to be executed meta=false {code} 2. the RS restarted on the same host, and the servername is removed from the deadserver's list {code} 2018-08-09 12:29:58,034 DEBUG [RpcServer.default.FPBQ.Fifo.handler=157,queue=13,port=16000] master.ServerManager: REPORT: Server izbp1azj9xjvk1h9vioyvfz,16020,1533787086010 came back up, removed it fro m the dead servers list {code} 3. Another UnassinProcedure detect this one too, since it thinks no one is handling it, a SCP is submitted again {code} 2018-08-09 12:29:58,061 WARN [PEWorker-15] assignment.RegionTransitionProcedure: Remote call failed izbp1azj9xjvk1h9vioyvfz,16020,1533725024975; pid=4034, ppid=4012, state=RUNNABLE:REGION_TRANSITION_D ISPATCH, hasLock=true; UnassignProcedure table=randowmWrite15, region=e07c5ad01ce7b76b80a92e809fb98e26, server=izbp1azj9xjvk1h9vioyvfz,16020,1533735186645; rit=CLOSING, location=izbp1azj9xjvk1h9vioyvfz ,16020,1533725024975; exception=NoServerDispatchException org.apache.hadoop.hbase.procedure2.NoServerDispatchException: izbp1azj9xjvk1h9vioyvfz,16020,1533725024975; pid=4034, ppid=4012, state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=true; UnassignProcedur e table=randowmWrite15, region=e07c5ad01ce7b76b80a92e809fb98e26, server=izbp1azj9xjvk1h9vioyvfz,16020,1533735186645 at org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher.addOperationToNode(RemoteProcedureDispatcher.java:177) at org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.addToRemoteDispatcher(RegionTransitionProcedure.java:263) at org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:207) at org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:349) at org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:101) at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:873) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1498) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1278) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:76) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1785) 2018-08-09 12:29:58,061 WARN [PEWorker-15] assignment.UnassignProcedure: Expiring izbp1azj9xjvk1h9vioyvfz,16020,1533725024975, pid=4034, ppid=4012, state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=t rue; UnassignProcedure table=randowmWrite15, region=e07c5ad01ce7b76b80a92e809fb98e26, server=izbp1azj9xjvk1h9vioyvfz,16020,1533735186645 rit=CLOSING, location=izbp1azj9xjvk1h9vioyvfz,16020,153372502497 5; exception=NoServerDispatchException 2018-08-09 12:29:58,061 WARN [PEWorker-15] master.ServerManager: Expiration of izbp1azj9xjvk1h9vioyvfz,16020,1533725024975 but server not online 2018-08-09 12:29:58,061 INFO [PEWorker-15] master.ServerManager: Processing expiration of izbp1azj9xjvk1h9vioyvfz,16020,1533725024975 on izbp1azj9xjvk1h9vioyvfz,16000,1533787159573 2018-08-09 12:29:58,540 DEBUG [PEWorker-15] assignment.AssignmentManager: Added=izbp1azj9xjvk1h9vioyvfz,16020,1533725024975 to dead servers, submitted shutdown handler to be executed meta=false {code} > SCP can be scheduled multiple times for the same RS > --------------------------------------------------- > > Key: HBASE-20976 > URL: https://issues.apache.org/jira/browse/HBASE-20976 > Project: HBase > Issue Type: Sub-task > Affects Versions: 2.0.1 > Reporter: Allan Yang > Assignee: Allan Yang > Priority: Major > Fix For: 2.0.2 > > Attachments: HBASE-20976.branch-2.0.001.patch > > > SCP can be scheduled multiple times for the same RS: > 1. a RS crashed, a SCP was submitted for it > 2. before this SCP finish, the Master crashed > 3. The new master will scan the meta table and find some region is still open > on a dead server > 4. The new master submit a SCP for the dead server again > The two SCP for the same RS can even execute concurrently if without > HBASE-20846… > Provided a test case to reproduce this issue and a fix solution in the patch. > Another case that SCP might be scheduled multiple times for the same RS(with > HBASE-20708.): > 1. a RS crashed, a SCP was submitted for it > 2. A new RS on the same host started, the old RS's Serveranme was remove from > DeadServer.deadServers > 3. after the SCP passed the Handle_RIT state, a UnassignProcedure need to > send a close region operation to the crashed RS > 4. The UnassignProcedure's dispatch failed since 'NoServerDispatchException' > 5. Begin to expire the RS, but only find it not online and not in deadServer > list, so a SCP was submitted for the same RS again > -- This message was sent by Atlassian JIRA (v7.6.3#76005)