[
https://issues.apache.org/jira/browse/CRUNCH-644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899072#comment-16899072
]
Andrew Olson commented on CRUNCH-644:
-------------------------------------
A problem was found with this when using a non-default namespaced table,
{noformat}
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException:
Relative path in absolute URI: regionLocations_myTableNamespace:myTableName
at org.apache.hadoop.fs.Path.initialize(Path.java:206)
at org.apache.hadoop.fs.Path.<init>(Path.java:172)
at org.apache.hadoop.fs.Path.<init>(Path.java:94)
at
org.apache.crunch.io.hbase.HFileUtils.writeToHFilesForIncrementalLoad(HFileUtils.java:517)
at
org.apache.crunch.io.hbase.HFileUtils.writePutsToHFilesForIncrementalLoad(HFileUtils.java:608)
at
org.apache.crunch.io.hbase.HFileUtils.writePutsToHFilesForIncrementalLoad(HFileUtils.java:578)
at
org.apache.crunch.io.hbase.HFileUtils.writePutsToHFilesForIncrementalLoad(HFileUtils.java:542)
...
{noformat}
The ":" delimiter in the qualified table name isn't a valid path element
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/model.html#Paths_and_Path_Elements
Opened CRUNCH-688 to fix this.
> Set HDFS node affinity on created HFiles to improve locality
> ------------------------------------------------------------
>
> Key: CRUNCH-644
> URL: https://issues.apache.org/jira/browse/CRUNCH-644
> Project: Crunch
> Issue Type: Improvement
> Reporter: Gabriel Reid
> Assignee: Gabriel Reid
> Priority: Major
> Fix For: 1.0.0
>
> Attachments: CRUNCH-644.patch
>
>
> When creating HFiles via the {{HFileUtils.writeToHFilesForIncrementalLoad}}
> method, the underlying HDFS blocks of the created HFiles will end up on a
> selection of HDFS data nodes -- the selection of which nodes is left up to
> the HDFS Namenode. This means that there is a relatively small chance
> (depending on cluster size and replication factor) that the created HFiles
> will end up on the same physical machine as the region server which will make
> use of these HFiles, which limits the ability to use short-circuit reads to
> the local file system. Typically, this lack of locality is only really
> completely resolved after a major compaction.
> It's possible to set a node affinity on HDFS files at creation time, to
> provide a suggestion to the namenode about a preferred data node for blocks
> to be located on. The intention of this ticket is to make use of this
> functionality to set the node affinity during HFile creation in
> {{HFileUtils.writeToHFilesForIncrementalLoad}} so that at least one (HDFS)
> block of each created HFile will be located on the same physical machine as
> the region server which will be using the file (assuming HDFS data nodes are
> running on the same machines as HBase region servers).
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)