[ https://issues.apache.org/jira/browse/HBASE-15482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202543#comment-15202543 ]
Dave Latham commented on HBASE-15482: ------------------------------------- Yes, I quite agree - we also skip it. > Provide an option to skip calculating block locations for SnapshotInputFormat > ----------------------------------------------------------------------------- > > Key: HBASE-15482 > URL: https://issues.apache.org/jira/browse/HBASE-15482 > Project: HBase > Issue Type: Improvement > Components: mapreduce > Reporter: Liyin Tang > Priority: Minor > > When a MR job is reading from SnapshotInputFormat, it needs to calculate the > splits based on the block locations in order to get best locality. However, > this process may take a long time for large snapshots. > In some setup, the computing layer, Spark, Hive or Presto could run out side > of HBase cluster. In these scenarios, the block locality doesn't matter. > Therefore, it will be great to have an option to skip calculating the block > locations for every job. That will super useful for the Hive/Presto/Spark > connectors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)