[ 
https://issues.apache.org/jira/browse/HBASE-14129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-14129:
-----------------------------------
    Fix Version/s:     (was: 0.98.14)
                   0.98.15
                   1.3.0
           Status: Open  (was: Patch Available)

{code}
diff --git 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
index f7f98fe..1c3ceee 100644
--- 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
+++ 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
@@ -539,6 +539,12 @@ public class AssignmentManager {
       LOG.info("Clean cluster startup. Assigning user regions");
       assignAllUserRegions(allRegions);
     }
+
+    if (this.server.getConfiguration().getBoolean("hbase.full.cluster.start", 
false)) {
+      // Hint to do a full cluster startup cluster startup.
+      LOG.info("Clean cluster startup forced via parameterized startup. 
Assigning user regions");
+      assignAllUserRegions(allRegions);
+    }
     // unassign replicas of the split parents and the merged regions
     // the daughter replicas are opened in assignAllUserRegions if it was
     // not already opened.
{code}

Can someone who knows the AM better take a quick peek if this is sufficient?

> If any regionserver gets shutdown uncleanly during full cluster restart, 
> locality looks to be lost
> --------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-14129
>                 URL: https://issues.apache.org/jira/browse/HBASE-14129
>             Project: HBase
>          Issue Type: Bug
>            Reporter: churro morales
>             Fix For: 2.0.0, 1.3.0, 0.98.15
>
>         Attachments: HBASE-14129.patch
>
>
> We were doing a cluster restart the other day.  Some regionservers did not 
> shut down cleanly.  Upon restart our locality went from 99% to 5%.  Upon 
> looking at the AssignmentManager.joinCluster() code it calls 
> AssignmentManager.processDeadServersAndRegionsInTransition().
> If the failover flag gets set for any reason it seems we don't call 
> assignAllUserRegions().  Then it looks like the balancer does the work in 
> assigning those regions, we don't use a locality aware balancer and we lost 
> our region locality.
> I don't have a solid grasp on the reasoning for these checks but there could 
> be some potential workarounds here.
> 1. After shutting down your cluster, move your WALs aside (replay later).  
> 2. Clean up your zNodes 
> That seems to work, but requires a lot of manual labor.  Another solution 
> which I prefer would be to have a flag for ./start-hbase.sh --clean 
> If we start master with that flag then we do a check in 
> AssignmentManager.processDeadServersAndRegionsInTransition()  thus if this 
> flag is set we call: assignAllUserRegions() regardless of the failover state.
> I have a patch for the later solution, that is if I am understanding the 
> logic correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to