[ https://issues.apache.org/jira/browse/HBASE-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081384#comment-13081384 ]
Ted Yu commented on HBASE-4114: ------------------------------- +1 on patch. > Metrics for HFile HDFS block locality > ------------------------------------- > > Key: HBASE-4114 > URL: https://issues.apache.org/jira/browse/HBASE-4114 > Project: HBase > Issue Type: Improvement > Components: metrics, regionserver > Reporter: Ming Ma > Assignee: Ming Ma > Attachments: HBASE-4114-trunk.patch, HBASE-4114-trunk.patch, > HBASE-4114-trunk.patch, HBASE-4114-trunk.patch, HBASE-4114-trunk.patch > > > Normally, when we put hbase and HDFS in the same cluster ( e.g., region > server runs on the datenode ), we have a reasonably good data locality, as > explained by Lars. Also Work has been done by Jonathan to address the startup > situation. > There are scenarios where regions can be on a different machine from the > machines that hold the underlying HFile blocks, at least for some period of > time. This will have performance impact on whole table scan operation and map > reduce job during that time. > 1. After load balancer moves the region and before compaction (thus > generate HFile on the new region server ) on that region, HDFS block can be > remote. > 2. When a new machine is added, or removed, Hbase's region assignment > policy is different from HDFS's block reassignment policy. > 3. Even if there is no much hbase activity, HDFS can load balance HFile > blocks as other non-hbase applications push other data to HDFS. > Lots has been or will be done in load balancer, as summarized by Ted. I am > curious if HFile HDFS block locality should be used as another factor here. > I have done some experiments on how HDFS block locality can impact map reduce > latency. First we need to define a metrics to measure HFile data locality. > Metrics defintion: > For a given table, or a region server, or a region, we can define the > following. The higher the value, the more local HFile is from region server's > point of view. > HFile locality index = ( Total number of HDFS blocks that can be retrieved > locally by the region server ) / ( Total number of HDFS blocks for all HFiles > ) > Test Results: > This is to show how HFile locality can impact the latency. It is based on a > table with 1M rows, 36KB per row; regions are distributed in balance. The map > job is RowCounter. > HFile Locality Index Map job latency ( in sec ) > 28% 157 > 36% 150 > 47% 142 > 61% 133 > 73% 122 > 89% 103 > 99% 95 > So the first suggestion is to expose HFile locality index as a new region > server metrics. It will be ideal if we can somehow measure HFile locality > index on a per map job level. > Regarding if/when we should include that as another factor for load balancer, > that will be a different work item. It is unclear how load balancer can take > various factors into account to come up with the best load balancer strategy. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira