[ 
https://issues.apache.org/jira/browse/HBASE-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081384#comment-13081384
 ] 

Ted Yu commented on HBASE-4114:
-------------------------------

+1 on patch.

> Metrics for HFile HDFS block locality
> -------------------------------------
>
>                 Key: HBASE-4114
>                 URL: https://issues.apache.org/jira/browse/HBASE-4114
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics, regionserver
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HBASE-4114-trunk.patch, HBASE-4114-trunk.patch, 
> HBASE-4114-trunk.patch, HBASE-4114-trunk.patch, HBASE-4114-trunk.patch
>
>
> Normally, when we put hbase and HDFS in the same cluster ( e.g., region 
> server runs on the datenode ), we have a reasonably good data locality, as 
> explained by Lars. Also Work has been done by Jonathan to address the startup 
> situation.
> There are scenarios where regions can be on a different machine from the 
> machines that hold the underlying HFile blocks, at least for some period of 
> time. This will have performance impact on whole table scan operation and map 
> reduce job during that time.
> 1.    After load balancer moves the region and before compaction (thus 
> generate HFile on the new region server ) on that region, HDFS block can be 
> remote.
> 2.    When a new machine is added, or removed, Hbase's region assignment 
> policy is different from HDFS's block reassignment policy.
> 3.    Even if there is no much hbase activity, HDFS can load balance HFile 
> blocks as other non-hbase applications push other data to HDFS.
> Lots has been or will be done in load balancer, as summarized by Ted. I am 
> curious if HFile HDFS block locality should be used as another factor here.
> I have done some experiments on how HDFS block locality can impact map reduce 
> latency. First we need to define a metrics to measure HFile data locality.
> Metrics defintion:
> For a given table, or a region server, or a region, we can define the 
> following. The higher the value, the more local HFile is from region server's 
> point of view.
> HFile locality index = ( Total number of HDFS blocks that can be retrieved 
> locally by the region server ) / ( Total number of HDFS blocks for all HFiles 
> )
> Test Results:
> This is to show how HFile locality can impact the latency. It is based on a 
> table with 1M rows, 36KB per row; regions are distributed in balance. The map 
> job is RowCounter.
> HFile Locality Index  Map job latency ( in sec )
> 28%                   157
> 36%                   150
> 47%                   142
> 61%                   133
> 73%                   122
> 89%                   103
> 99%                   95
> So the first suggestion is to expose HFile locality index as a new region 
> server metrics. It will be ideal if we can somehow measure HFile locality 
> index on a per map job level.
> Regarding if/when we should include that as another factor for load balancer, 
> that will be a different work item. It is unclear how load balancer can take 
> various factors into account to come up with the best load balancer strategy.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to