[ 
https://issues.apache.org/jira/browse/HBASE-18215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055201#comment-16055201
 ] 

chenxu commented on HBASE-18215:
--------------------------------

bq. Can you provide a specific scenario?

there is a scenario

{code:title=RSGroupBasedLoadBalancer.java|borderStyle=solid}
public List<RegionPlan> balanceCluster(Map<ServerName, List<HRegionInfo>> 
clusterState)
    throws HBaseIOException {
    ...
    List<HRegionInfo> misplacedRegions = 
correctedState.get(LoadBalancer.BOGUS_SERVER_NAME);
    for (HRegionInfo regionInfo : misplacedRegions) {
      regionPlans.add(new RegionPlan(regionInfo, null, null));
    }
   ...
{code}
if region misplaced, RegionPlan’s dest is null, and the following will happen
{code:title=AssignmentManager.java|borderStyle=solid}
private RegionPlan getRegionPlan(final HRegionInfo region,
    final boolean forceNewPlan) throws HBaseIOException {
    ...
   if (forceNewPlan
          || existingPlan == null
          || existingPlan.getDestination() == null
          || !destServers.contains(existingPlan.getDestination())) {
        newPlan = true;
        try {
          randomPlan = new RegionPlan(region, null,
              balancer.randomAssignment(region, destServers));
        } catch (IOException ex) {
          LOG.warn("Failed to create new plan.",ex);
          return null;
        }
        this.regionPlans.put(encodedName, randomPlan);
      }
    }
    if (newPlan) {
      if (randomPlan.getDestination() == null) {
        LOG.warn("Can't find a destination for " + encodedName);
        return null;
      }
      ...
      return randomPlan;
    }
    ...
}
{code}
if balancer.randomAssignment return null, getRegionPlan will null, AM will 
handle like this
{code:title=AssignmentManager.java|borderStyle=solid}
private void assign(RegionState state, boolean forceNewPlan) {
    ...
    if (plan == null) {
      LOG.warn("Unable to determine a plan to assign " + region);
      // For meta region, we have to keep retrying until succeeding
      if (region.isMetaRegion()) {
        ....
      }
      regionStates.updateRegionState(region, State.FAILED_OPEN);
      return;
    }
    ....
}
{code}
the target region will be transition to FAILED_OPEN status.
but if balancer.randomAssignment return BOGUS_SERVER_NAME, AM can’t handle this.
an OPEN RPC will be sent to the BOGUS_SERVER_NAME, this should not happen

> some advises about refactoring of rsgroup
> -----------------------------------------
>
>                 Key: HBASE-18215
>                 URL: https://issues.apache.org/jira/browse/HBASE-18215
>             Project: HBase
>          Issue Type: Improvement
>          Components: Balancer
>            Reporter: chenxu
>         Attachments: HBASE-18215-1.2.4-v1.patch
>
>
> recently we have Integrated rsgroup into our cluster,  after Integrated, 
> found some refactoring points. maybe the points were not right, but i think 
> there is a need to share with you guys.
> # when hbase.balancer.tablesOnMaster configured, RSGroupBasedLoadBalancer 
> should consider masterServer assignment first in balanceCluster, 
> roundRobinAssignment, retainAssignment and randomAssignment
>   do the same thing as BaseLoadBalancer
> # why not use a local file as the persistence layer instead of rsgroup table. 
> in our implementation, we first modify the local rsgroup file, then load the 
> group info into memory, after that execute the balancer command, everything 
> is OK.
> when loading do some sanity check:
> (1) one server can not be owned by multi group
> (2) one table can not be owned by multi group
> (3) if group has table, it must also has servers
> (4) default group must has servers in it
> if sanity check can’t pass, give up the following process.work as this, it 
> can greatly reduce the complexity of rsgroup implementation, there is no need 
> to wait for the rsgroup table to be online, and methods like moveServers, 
> moveTables, addRSGroup, removeRSGroup, moveServersAndTables can be removed 
> from RSGroupAdminService.only a refresh method is need(modify persistence 
> layer first and refresh the memory)
> # we should add some group informations on master web UI
> to do this, RSGroupBasedLoadBalancer should move to hbase-server module, 
> because MasterStatusTmpl.jamon depends on it
> # there may be some issues about RSGroupBasedLoadBalancer.roundRobinAssignment
> if two groups both include BOGUS_SERVER_NAME, assignments.putAll will 
> overwrite the previous data
> # there may be some issues about RSGroupBasedLoadBalancer.randomAssignment
> when the return value is BOGUS_SERVER_NAME, AM can not handle this case. we 
> should return null value instead of BOGUS_SERVER_NAME.
> # when RSGroupBasedLoadBalancer.balanceCluster execute, groups are balanced 
> one by one, if there are two many groups, we can do this in parallel.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to