[ 
https://issues.apache.org/jira/browse/HBASE-21672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16737370#comment-16737370
 ] 

Nihal Jain edited comment on HBASE-21672 at 1/8/19 6:23 PM:
------------------------------------------------------------

{quote}Shouldn't this either be a no-op for filesystems that don't have 
locality, or something we can just ask the filesystem?
{quote}
The file-system does not directly return anything as locality as such. We have 
some logic to calculate it in hbase. it is based on {{HDFSBlocksDistribution}} 
information which we create using block location information returned by under 
lying fs. 
{code:java}
  static public HDFSBlocksDistribution computeHDFSBlocksDistribution(
    final FileSystem fs, FileStatus status, long start, long length)
    throws IOException {
    HDFSBlocksDistribution blocksDistribution = new HDFSBlocksDistribution();
    BlockLocation [] blockLocations =
      fs.getFileBlockLocations(status, start, length);
    for(BlockLocation bl : blockLocations) {
      String [] hosts = bl.getHosts();
      long len = bl.getLength();
      blocksDistribution.addHostsAndBlockWeight(hosts, len);
    }

    return blocksDistribution;
  }
{code}
 

I think this solution should be fine, and will be useful, given we know our fs 
would not do us any good and may waste cpu cycles in creating this 
{{HDFSBlocksDistribution}} information. In fact we already have something 
similar in HBase, see HBASE-18478.


was (Author: nihaljain.cs):
{quote}Shouldn't this either be a no-op for filesystems that don't have 
locality, or something we can just ask the filesystem?
{quote}
The file-system does not directly return anything as locality as such. We have 
some logic to calculate it in hbase. it is based on {{HDFSBlocksDistribution}} 
information which we create using block location information returned by under 
lying fs.

I think this solution should be fine, and will be useful, given we know our fs 
would not do us any good and may waste cpu cycles in creating this 
{{HDFSBlocksDistribution}} information. In fact we already have something 
similar in HBase, see 
[HBASE-18478|https://issues.apache.org/jira/browse/HBASE-18478].

> Allow skipping HDFS block distribution computation
> --------------------------------------------------
>
>                 Key: HBASE-21672
>                 URL: https://issues.apache.org/jira/browse/HBASE-21672
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Nihal Jain
>            Assignee: Nihal Jain
>            Priority: Major
>              Labels: S3
>
> We should have a configuration to skip HDFS block distribution calculation in 
> HBase. For example on file systems that do not surface locality such as S3, 
> calculating block distribution would not be any useful.
> Currentlly, we do not have a way to skip hdfs block distribution computation. 
> For this, we can provide a new configuration key, say 
> {{hbase.block.distribution.skip.computation}} (which would be {{false}} by 
> default).
> Users using filesystems such as s3 may choose to make this {{true}}, thus 
> skipping block distribution computation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to