Metrics for HFile HDFS block locality
-------------------------------------

                 Key: HBASE-4114
                 URL: https://issues.apache.org/jira/browse/HBASE-4114
             Project: HBase
          Issue Type: Improvement
          Components: metrics, regionserver
         Environment: Normally, when we put hbase and HDFS in the same cluster 
( e.g., region server runs on the datenode ), we have a reasonably good data 
locality, as explained by Lars. Also Work has been done by Jonathan to address 
the startup situation.

There are scenarios where regions can be on a different machine from the 
machines that hold the underlying HFile blocks, at least for some period of 
time. This will have performance impact on whole table scan operation and map 
reduce job during that time.

1.      After load balancer moves the region and before compaction (thus 
generate HFile on the new region server ) on that region, HDFS block can be 
remote.
2.      When a new machine is added, or removed, Hbase's region assignment 
policy is different from HDFS's block reassignment policy.
3.      Even if there is no much hbase activity, HDFS can load balance HFile 
blocks as other non-hbase applications push other data to HDFS.

Lots has been or will be done in load balancer, as summarized by Ted. I am 
curious if HFile HDFS block locality should be used as another factor here.

I have done some experiments on how HDFS block locality can impact map reduce 
latency. First we need to define a metrics to measure HFile data locality.

Metrics defintion:

For a given table, or a region server, or a region, we can define the 
following. The higher the value, the more local HFile is from region server's 
point of view.

HFile locality index = ( Total number of HDFS blocks that can be retrieved 
locally by the region server ) / ( Total number of HDFS blocks for all HFiles )

Test Results:
This is to show how HFile locality can impact the latency. It is based on a 
table with 1M rows, 36KB per row; regions are distributed in balance. The map 
job is RowCounter.

HFile Locality Index    Map job latency ( in sec )
28%                     157
36%                     150
47%                     142
61%                     133
73%                     122
89%                     103
99%                     95

So the first suggestion is to expose HFile locality index as a new region 
server metrics. It will be ideal if we can somehow measure HFile locality index 
on a per map job level.

Regarding if/when we should include that as another factor for load balancer, 
that will be a different work item. It is unclear how load balancer can take 
various factors into account to come up with the best load balancer strategy.
            Reporter: Ming Ma
            Assignee: Ming Ma




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to