[ https://issues.apache.org/jira/browse/HBASE-14129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Purtell updated HBASE-14129: ----------------------------------- Fix Version/s: (was: 0.98.16) > If any regionserver gets shutdown uncleanly during full cluster restart, > locality looks to be lost > -------------------------------------------------------------------------------------------------- > > Key: HBASE-14129 > URL: https://issues.apache.org/jira/browse/HBASE-14129 > Project: HBase > Issue Type: Bug > Reporter: churro morales > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-14129.patch > > > We were doing a cluster restart the other day. Some regionservers did not > shut down cleanly. Upon restart our locality went from 99% to 5%. Upon > looking at the AssignmentManager.joinCluster() code it calls > AssignmentManager.processDeadServersAndRegionsInTransition(). > If the failover flag gets set for any reason it seems we don't call > assignAllUserRegions(). Then it looks like the balancer does the work in > assigning those regions, we don't use a locality aware balancer and we lost > our region locality. > I don't have a solid grasp on the reasoning for these checks but there could > be some potential workarounds here. > 1. After shutting down your cluster, move your WALs aside (replay later). > 2. Clean up your zNodes > That seems to work, but requires a lot of manual labor. Another solution > which I prefer would be to have a flag for ./start-hbase.sh --clean > If we start master with that flag then we do a check in > AssignmentManager.processDeadServersAndRegionsInTransition() thus if this > flag is set we call: assignAllUserRegions() regardless of the failover state. > I have a patch for the later solution, that is if I am understanding the > logic correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)