[ https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580372#comment-16580372 ]
Josh Elser commented on HBASE-20976: ------------------------------------ Coming in late... {quote}the worst case is that there is a race condition so we still schedule redundant SCPs, still better than now I think {quote} {quote}Yes. Could age them out instead... i.e. a deadserver needs to stick around for an hour at least? {quote} What's the downside of this: we run an SCP for a RS that already was processed or something worse? As long as SCP is idempotent, we'd just want to reduce the likelihood that we do things multiple times (maybe I'm incorrectly assuming that SCP is idempotent ;)) > SCP can be scheduled multiple times for the same RS > --------------------------------------------------- > > Key: HBASE-20976 > URL: https://issues.apache.org/jira/browse/HBASE-20976 > Project: HBase > Issue Type: Sub-task > Affects Versions: 2.1.0, 2.0.1 > Reporter: Allan Yang > Assignee: Allan Yang > Priority: Major > Attachments: HBASE-20976.branch-2.0.001.patch, > HBASE-20976.branch-2.0.002.patch > > > SCP can be scheduled multiple times for the same RS: > 1. a RS crashed, a SCP was submitted for it > 2. before this SCP finish, the Master crashed > 3. The new master will scan the meta table and find some region is still open > on a dead server > 4. The new master submit a SCP for the dead server again > The two SCP for the same RS can even execute concurrently if without > HBASE-20846… > Provided a test case to reproduce this issue and a fix solution in the patch. > Another case that SCP might be scheduled multiple times for the same RS(with > HBASE-20708.): > 1. a RS crashed, a SCP was submitted for it > 2. A new RS on the same host started, the old RS's Serveranme was remove from > DeadServer.deadServers > 3. after the SCP passed the Handle_RIT state, a UnassignProcedure need to > send a close region operation to the crashed RS > 4. The UnassignProcedure's dispatch failed since 'NoServerDispatchException' > 5. Begin to expire the RS, but only find it not online and not in deadServer > list, so a SCP was submitted for the same RS again > -- This message was sent by Atlassian JIRA (v7.6.3#76005)