[ https://issues.apache.org/jira/browse/HBASE-15482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16285528#comment-16285528 ]
Jerry He commented on HBASE-15482: ---------------------------------- Hi, [~water] In the 002 patch, you added 'numTopsAtMost' in getBestLocations. You will need another 'break' in the loop? Like: If numTopsAtMost is met, then break out. But again, the new code with this 'numTopsAtMost' is probably unnecessary. The comment for the method getBestLocations has explained that it is not very likely you will get more than 3 hosts with at least 80% (hbase.tablesnapshotinputformat.locality.cutoff.multiplier) as much block locality as the top host with the best locality. So you will break out early anyway with the filterWeight check. Your first patch's logic is good enough. The added comment is good. {code} // As hostAndWeights is in descending order, // we could break the loop as long as we meet a weight which is less than filterWeight {code} > Provide an option to skip calculating block locations for SnapshotInputFormat > ----------------------------------------------------------------------------- > > Key: HBASE-15482 > URL: https://issues.apache.org/jira/browse/HBASE-15482 > Project: HBase > Issue Type: Improvement > Components: mapreduce > Reporter: Liyin Tang > Assignee: Xiang Li > Priority: Minor > Fix For: 2.1.0 > > Attachments: HBASE-15482.master.000.patch, > HBASE-15482.master.001.patch, HBASE-15482.master.002.patch > > > When a MR job is reading from SnapshotInputFormat, it needs to calculate the > splits based on the block locations in order to get best locality. However, > this process may take a long time for large snapshots. > In some setup, the computing layer, Spark, Hive or Presto could run out side > of HBase cluster. In these scenarios, the block locality doesn't matter. > Therefore, it will be great to have an option to skip calculating the block > locations for every job. That will super useful for the Hive/Presto/Spark > connectors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)