[
https://issues.apache.org/jira/browse/HADOOP-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12517665
]
Jim Kellerman commented on HADOOP-1678:
---------------------------------------
The region server that performed the split should serve the lower half of the
split region.
It should update the meta with information about the two new regions and change
the region info for the old parent region to indicate it is being shared by the
two children.
When it reports the split to the master, the master will assign the upper half
region to the most lightly loaded server.
In order to determine load, HServerInfo should be augmented to include the
number of regions being served by the region server with other statistics such
as Runtime.freeMemory(), Runtime.totalMemory(), Runtime.maxMemory(),
Runtime.availableProcessors().request rate, etc. These statistics can be used
to determine a region servers "load factor". (actually the load factor is
probably what needs to go into the HServerInfo object - the region server can
compute its load factor before sending its regular heartbeat message)
Should the master miss the split message, it will assign the upper child region
during the next meta scan (since the region server updated the meta before
reporting the split to the master).
The master will need to track the load factor of each server so that it can
assign new regions to the server with the smallest load factor.
Periodically, the master should run a thread that attempts to re-balance the
load on the cluster. Without detailed statistics such as the request rate per
region, however it would be hard for the master to make a determination of
which regions should be moved to a different server in order to most
effectively balance the load. For example a server could be serving 1000
regions which are receiving little traffic and still be assigned another region
without greatly effecting its performance. Another server could be serving two
heavy traffic regions yet be so heavily loaded that it should be relieved of
one of the regions to more effectively balance load.
In the near term, computing a load factor from percentage of free memory and
request rate is probably the best metric for determining which server should be
assigned a new region. As we gain more experience with HBase performance we can
include some of the other factors mentioned above.
> [hbase] On region split, master should designate which host should serve
> daughter splits
> ----------------------------------------------------------------------------------------
>
> Key: HADOOP-1678
> URL: https://issues.apache.org/jira/browse/HADOOP-1678
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Reporter: stack
>
> On region split, the daughter regions are deployed on the same host as served
> the split parent. This makes it so currently (unless the cluster is
> restarted), as a table grows, all its regions remain on the one server.
> Instead, jurisdiction over who serves daughter splits should be passed to the
> master. If possible, before making a determination, the master should take
> into consideration current cluster loadings and region distribution.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.