[ 
https://issues.apache.org/jira/browse/HADOOP-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12517665
 ] 

Jim Kellerman commented on HADOOP-1678:
---------------------------------------

The region server that performed the split should serve the lower half of the 
split region.

It should update the meta with information about the two new regions and change 
the region info for the old parent region to indicate it is being shared by the 
two children.

When it reports the split to the master, the master will assign the upper half 
region  to the most lightly loaded server.

In order to determine load, HServerInfo should be augmented to include the 
number of regions being served by the region server with other statistics such 
as Runtime.freeMemory(),  Runtime.totalMemory(), Runtime.maxMemory(), 
Runtime.availableProcessors().request rate, etc. These statistics can be used 
to determine a region servers "load factor". (actually the load factor is 
probably what needs to go into the HServerInfo object - the region server can 
compute its load factor before sending its regular heartbeat message)

Should the master miss the split message, it will assign the upper child region 
during the next meta scan (since the region server updated the meta before 
reporting the split to the master).

The master will need to track the load factor of each server so that it can 
assign new regions to the server with the smallest load factor.

Periodically, the master should run a thread that attempts to re-balance the 
load on the cluster. Without detailed statistics such as the request rate per 
region, however it would be hard for the master to make a determination of 
which regions should be moved to a different server in order to most 
effectively balance the load. For example a server could be serving 1000 
regions which are receiving little traffic and still be assigned another region 
without greatly effecting its performance. Another server could be serving two 
heavy traffic regions yet be so heavily loaded that it should be relieved of 
one of the regions to more effectively balance load.

In the near term, computing a load factor from percentage of free memory and 
request rate is probably the best metric for determining which server should be 
assigned a new region. As we gain more experience with HBase performance we can 
include some of the other factors mentioned above.

> [hbase] On region split, master should designate which host should serve 
> daughter splits
> ----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1678
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1678
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: stack
>
> On region split, the daughter regions are deployed on the same host as served 
> the split parent.  This makes it so currently (unless the cluster is 
> restarted), as a table grows, all its regions remain on the one server.
> Instead, jurisdiction over who serves daughter splits should be passed to the 
> master.  If possible, before making a determination, the master should take 
> into consideration current cluster loadings and region distribution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to