[
https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16575920#comment-16575920
]
Allan Yang commented on HBASE-20976:
------------------------------------
[~stack],[~Apache9]. Sorry that I have to reopen this again, since I find
another case that SCP can be scheduled multiple times…… As you can see from the
issue's description.
1. the RS is expired, and a SCP was submitted
{code}
2018-08-09 12:29:55,665 WARN [PEWorker-11] master.ServerManager: Expiration of
izbp1azj9xjvk1h9vioyvfz,16020,1533725024975 but server not online
2018-08-09 12:29:55,665 INFO [PEWorker-11] master.ServerManager: Processing
expiration of izbp1azj9xjvk1h9vioyvfz,16020,1533725024975 on
izbp1azj9xjvk1h9vioyvfz,16000,1533787159573
2018-08-09 12:29:55,815 DEBUG [PEWorker-11] assignment.AssignmentManager:
Added=izbp1azj9xjvk1h9vioyvfz,16020,1533725024975 to dead servers, submitted
shutdown handler to be executed meta=false
{code}
2. the RS restarted on the same host, and the servername is removed from the
deadserver's list
{code}
2018-08-09 12:29:58,034 DEBUG
[RpcServer.default.FPBQ.Fifo.handler=157,queue=13,port=16000]
master.ServerManager: REPORT: Server
izbp1azj9xjvk1h9vioyvfz,16020,1533787086010 came back up, removed it fro
m the dead servers list
{code}
3. Another UnassinProcedure detect this one too, since it thinks no one is
handling it, a SCP is submitted again
{code}
2018-08-09 12:29:58,061 WARN [PEWorker-15]
assignment.RegionTransitionProcedure: Remote call failed
izbp1azj9xjvk1h9vioyvfz,16020,1533725024975; pid=4034, ppid=4012,
state=RUNNABLE:REGION_TRANSITION_D
ISPATCH, hasLock=true; UnassignProcedure table=randowmWrite15,
region=e07c5ad01ce7b76b80a92e809fb98e26,
server=izbp1azj9xjvk1h9vioyvfz,16020,1533735186645; rit=CLOSING,
location=izbp1azj9xjvk1h9vioyvfz
,16020,1533725024975; exception=NoServerDispatchException
org.apache.hadoop.hbase.procedure2.NoServerDispatchException:
izbp1azj9xjvk1h9vioyvfz,16020,1533725024975; pid=4034, ppid=4012,
state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=true; UnassignProcedur
e table=randowmWrite15, region=e07c5ad01ce7b76b80a92e809fb98e26,
server=izbp1azj9xjvk1h9vioyvfz,16020,1533735186645
at
org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher.addOperationToNode(RemoteProcedureDispatcher.java:177)
at
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.addToRemoteDispatcher(RegionTransitionProcedure.java:263)
at
org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:207)
at
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:349)
at
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:101)
at
org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:873)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1498)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1278)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:76)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1785)
2018-08-09 12:29:58,061 WARN [PEWorker-15] assignment.UnassignProcedure:
Expiring izbp1azj9xjvk1h9vioyvfz,16020,1533725024975, pid=4034, ppid=4012,
state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=t
rue; UnassignProcedure table=randowmWrite15,
region=e07c5ad01ce7b76b80a92e809fb98e26,
server=izbp1azj9xjvk1h9vioyvfz,16020,1533735186645 rit=CLOSING,
location=izbp1azj9xjvk1h9vioyvfz,16020,153372502497
5; exception=NoServerDispatchException
2018-08-09 12:29:58,061 WARN [PEWorker-15] master.ServerManager: Expiration of
izbp1azj9xjvk1h9vioyvfz,16020,1533725024975 but server not online
2018-08-09 12:29:58,061 INFO [PEWorker-15] master.ServerManager: Processing
expiration of izbp1azj9xjvk1h9vioyvfz,16020,1533725024975 on
izbp1azj9xjvk1h9vioyvfz,16000,1533787159573
2018-08-09 12:29:58,540 DEBUG [PEWorker-15] assignment.AssignmentManager:
Added=izbp1azj9xjvk1h9vioyvfz,16020,1533725024975 to dead servers, submitted
shutdown handler to be executed meta=false
{code}
> SCP can be scheduled multiple times for the same RS
> ---------------------------------------------------
>
> Key: HBASE-20976
> URL: https://issues.apache.org/jira/browse/HBASE-20976
> Project: HBase
> Issue Type: Sub-task
> Affects Versions: 2.0.1
> Reporter: Allan Yang
> Assignee: Allan Yang
> Priority: Major
> Fix For: 2.0.2
>
> Attachments: HBASE-20976.branch-2.0.001.patch
>
>
> SCP can be scheduled multiple times for the same RS:
> 1. a RS crashed, a SCP was submitted for it
> 2. before this SCP finish, the Master crashed
> 3. The new master will scan the meta table and find some region is still open
> on a dead server
> 4. The new master submit a SCP for the dead server again
> The two SCP for the same RS can even execute concurrently if without
> HBASE-20846…
> Provided a test case to reproduce this issue and a fix solution in the patch.
> Another case that SCP might be scheduled multiple times for the same RS(with
> HBASE-20708.):
> 1. a RS crashed, a SCP was submitted for it
> 2. A new RS on the same host started, the old RS's Serveranme was remove from
> DeadServer.deadServers
> 3. after the SCP passed the Handle_RIT state, a UnassignProcedure need to
> send a close region operation to the crashed RS
> 4. The UnassignProcedure's dispatch failed since 'NoServerDispatchException'
> 5. Begin to expire the RS, but only find it not online and not in deadServer
> list, so a SCP was submitted for the same RS again
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)