[
https://issues.apache.org/jira/browse/HBASE-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858876#action_12858876
]
Jan Lukavsky commented on HBASE-57:
-----------------------------------
Hi,
I suspect this issue is causing us trouble during Map/Reduce having HBase as
data source. TableInputFormat tells JobTracker that regions are data-local to
RegionServer, which serves them. This IMO causes serious imbalance of load on
small clusters (our has about 10 nodes), because the RegionServer may (and
probably will) contact DataNode on different machine. Thus, in extreme case,
single DataNode may (in some time) be handling reads from all the Mappers.
If regions were assigned to RegionServer which holds the most blocks, I suppose
this imbalance will be minimized. Stack's proposed solution seems fairly
appropriate to me.
> [hbase] Master should allocate regions to regionservers based upon data
> locality and rack awareness
> ---------------------------------------------------------------------------------------------------
>
> Key: HBASE-57
> URL: https://issues.apache.org/jira/browse/HBASE-57
> Project: Hadoop HBase
> Issue Type: Improvement
> Components: master
> Affects Versions: 0.2.0
> Reporter: stack
>
> Currently, regions are assigned regionservers based off a basic loading
> attribute. A factor to include in the assignment calcuation is the location
> of the region in hdfs; i.e. servers hosting region replicas. If the cluster
> is such that regionservers are being run on the same nodes as those running
> hdfs, then ideally the regionserver for a particular region should be running
> on the same server as hosts a region replica.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.