[ https://issues.apache.org/jira/browse/HBASE-21672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16737370#comment-16737370 ]
Nihal Jain edited comment on HBASE-21672 at 1/8/19 6:23 PM: ------------------------------------------------------------ {quote}Shouldn't this either be a no-op for filesystems that don't have locality, or something we can just ask the filesystem? {quote} The file-system does not directly return anything as locality as such. We have some logic to calculate it in hbase. it is based on {{HDFSBlocksDistribution}} information which we create using block location information returned by under lying fs. {code:java} static public HDFSBlocksDistribution computeHDFSBlocksDistribution( final FileSystem fs, FileStatus status, long start, long length) throws IOException { HDFSBlocksDistribution blocksDistribution = new HDFSBlocksDistribution(); BlockLocation [] blockLocations = fs.getFileBlockLocations(status, start, length); for(BlockLocation bl : blockLocations) { String [] hosts = bl.getHosts(); long len = bl.getLength(); blocksDistribution.addHostsAndBlockWeight(hosts, len); } return blocksDistribution; } {code} I think this solution should be fine, and will be useful, given we know our fs would not do us any good and may waste cpu cycles in creating this {{HDFSBlocksDistribution}} information. In fact we already have something similar in HBase, see HBASE-18478. was (Author: nihaljain.cs): {quote}Shouldn't this either be a no-op for filesystems that don't have locality, or something we can just ask the filesystem? {quote} The file-system does not directly return anything as locality as such. We have some logic to calculate it in hbase. it is based on {{HDFSBlocksDistribution}} information which we create using block location information returned by under lying fs. I think this solution should be fine, and will be useful, given we know our fs would not do us any good and may waste cpu cycles in creating this {{HDFSBlocksDistribution}} information. In fact we already have something similar in HBase, see [HBASE-18478|https://issues.apache.org/jira/browse/HBASE-18478]. > Allow skipping HDFS block distribution computation > -------------------------------------------------- > > Key: HBASE-21672 > URL: https://issues.apache.org/jira/browse/HBASE-21672 > Project: HBase > Issue Type: Improvement > Reporter: Nihal Jain > Assignee: Nihal Jain > Priority: Major > Labels: S3 > > We should have a configuration to skip HDFS block distribution calculation in > HBase. For example on file systems that do not surface locality such as S3, > calculating block distribution would not be any useful. > Currentlly, we do not have a way to skip hdfs block distribution computation. > For this, we can provide a new configuration key, say > {{hbase.block.distribution.skip.computation}} (which would be {{false}} by > default). > Users using filesystems such as s3 may choose to make this {{true}}, thus > skipping block distribution computation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)