[jira] [Commented] (HBASE-15482) Provide an option to skip calculating block locations for SnapshotInputFormat

Xiang Li (JIRA) Thu, 07 Dec 2017 20:26:02 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-15482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283035#comment-16283035
 ]


Xiang Li commented on HBASE-15482:
----------------------------------

[~tedyu], thanks very much for your comments!
patch 001 is updated to address your comments as well as the errors reported by 
checkstyle.
* "hbase.TableSnapshotInputFormat.locality" is changed into 
"hbase.TableSnapshotInputFormat.locality.enable".
* The truncation of locations is moved into getBestLocations().
* The errors reported by checkstyle are corrected.

Regarding {{moving the truncation of locations into getBestLocations()}}:
The code has different logic for different combinations of 
hostAndWeights.length and numTopsAtMost.
And there is a small behavior change on getBestLocations() when 
hostAndWeights.length is 0:
* Originally, it returns a empty list.
* After the change, it returns null. I think we do not need to allocate an 
empty list here, as the locations will be used to construct 
TableSnapshotInputFormatImpl.InputSplit and null will be checked as follow
{code:title=hbase/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java|borderStyle=solid}
public InputSplit(TableDescriptor htd, HRegionInfo regionInfo, List<String> 
locations,
        Scan scan, Path restoreDir) {
      this.htd = htd;
      this.regionInfo = regionInfo;
      if (locations == null || locations.isEmpty()) { // <--- here
        this.locations = new String[0];
      } else {
        this.locations = locations.toArray(new String[locations.size()]);
      }
      try {
        this.scan = scan != null ? TableMapReduceUtil.convertScanToString(scan) 
: "";
      } catch (IOException e) {
        LOG.warn("Failed to convert Scan to String", e);
      }

      this.restoreDir = restoreDir.toString();
    }
{code}
And TableSnapshotInputFormatImpl is @InterfaceAudience.Private, there is no 
other calls of getBestLocations() in the whole HBase project except UTs. A UT 
is updated according to the change above.

> Provide an option to skip calculating block locations for SnapshotInputFormat
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-15482
>                 URL: https://issues.apache.org/jira/browse/HBASE-15482
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Liyin Tang
>            Assignee: Xiang Li
>            Priority: Minor
>             Fix For: 2.1.0
>
>         Attachments: HBASE-15482.master.000.patch
>
>
> When a MR job is reading from SnapshotInputFormat, it needs to calculate the 
> splits based on the block locations in order to get best locality. However, 
> this process may take a long time for large snapshots. 
> In some setup, the computing layer, Spark, Hive or Presto could run out side 
> of HBase cluster. In these scenarios, the block locality doesn't matter. 
> Therefore, it will be great to have an option to skip calculating the block 
> locations for every job. That will super useful for the Hive/Presto/Spark 
> connectors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-15482) Provide an option to skip calculating block locations for SnapshotInputFormat

Reply via email to