[ https://issues.apache.org/jira/browse/HBASE-8127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lars Hofhansl updated HBASE-8127: --------------------------------- Fix Version/s: 0.94.7 > Region of a disabling or disabled table could be stucked in transition state > when RS dies during Master initialization > ---------------------------------------------------------------------------------------------------------------------- > > Key: HBASE-8127 > URL: https://issues.apache.org/jira/browse/HBASE-8127 > Project: HBase > Issue Type: Bug > Affects Versions: 0.94.5 > Reporter: Jeffrey Zhong > Assignee: Jeffrey Zhong > Fix For: 0.94.7 > > Attachments: hbase-8127_v1.patch, reproduce-hang.patch > > > The issue happens when a RS dies during a master starts up. After the RS > reports open to the new master instance and dies immediately thereafter, the > RITs of disabling tables(or disabled table) on the died RS will be in RIT > state forever. > I attached a patch to simulate the situation and you can run the following > command to reproduce the issue: > {code}mvn test -PlocalTests > -Dtest=TestMasterFailover#testMasterFailoverWithMockedRITOnDeadRS{code} > Basically, we skip regions of a dead server inside > AM.processDeadServersAndRecoverLostRegions as the following code and relies > on SSH to process those skipped regions: > {code} > for (Pair<HRegionInfo, Result> deadRegion : deadServer.getValue()) { > nodes.remove(deadRegion.getFirst().getEncodedName()); > } > {code} > While in SSH, we skip regions of disabling(or disabled table) again by > function processDeadRegion. Finally comes to the issue that RITs of > disabling(or disabled table) stuck there forever. > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira