[
https://issues.apache.org/jira/browse/HBASE-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Yu updated HBASE-4114:
--------------------------
Status: Patch Available (was: Open)
> Metrics for HFile HDFS block locality
> -------------------------------------
>
> Key: HBASE-4114
> URL: https://issues.apache.org/jira/browse/HBASE-4114
> Project: HBase
> Issue Type: Improvement
> Components: metrics, regionserver
> Reporter: Ming Ma
> Assignee: Ming Ma
> Attachments: HBASE-4114-trunk.patch, HBASE-4114-trunk.patch,
> HBASE-4114-trunk.patch, HBASE-4114-trunk.patch, HBASE-4114-trunk.patch
>
>
> Normally, when we put hbase and HDFS in the same cluster ( e.g., region
> server runs on the datenode ), we have a reasonably good data locality, as
> explained by Lars. Also Work has been done by Jonathan to address the startup
> situation.
> There are scenarios where regions can be on a different machine from the
> machines that hold the underlying HFile blocks, at least for some period of
> time. This will have performance impact on whole table scan operation and map
> reduce job during that time.
> 1. After load balancer moves the region and before compaction (thus
> generate HFile on the new region server ) on that region, HDFS block can be
> remote.
> 2. When a new machine is added, or removed, Hbase's region assignment
> policy is different from HDFS's block reassignment policy.
> 3. Even if there is no much hbase activity, HDFS can load balance HFile
> blocks as other non-hbase applications push other data to HDFS.
> Lots has been or will be done in load balancer, as summarized by Ted. I am
> curious if HFile HDFS block locality should be used as another factor here.
> I have done some experiments on how HDFS block locality can impact map reduce
> latency. First we need to define a metrics to measure HFile data locality.
> Metrics defintion:
> For a given table, or a region server, or a region, we can define the
> following. The higher the value, the more local HFile is from region server's
> point of view.
> HFile locality index = ( Total number of HDFS blocks that can be retrieved
> locally by the region server ) / ( Total number of HDFS blocks for all HFiles
> )
> Test Results:
> This is to show how HFile locality can impact the latency. It is based on a
> table with 1M rows, 36KB per row; regions are distributed in balance. The map
> job is RowCounter.
> HFile Locality Index Map job latency ( in sec )
> 28% 157
> 36% 150
> 47% 142
> 61% 133
> 73% 122
> 89% 103
> 99% 95
> So the first suggestion is to expose HFile locality index as a new region
> server metrics. It will be ideal if we can somehow measure HFile locality
> index on a per map job level.
> Regarding if/when we should include that as another factor for load balancer,
> that will be a different work item. It is unclear how load balancer can take
> various factors into account to come up with the best load balancer strategy.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira