[ 
https://issues.apache.org/jira/browse/CRUNCH-644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899072#comment-16899072
 ] 

Andrew Olson commented on CRUNCH-644:
-------------------------------------

A problem was found with this when using a non-default namespaced table,

{noformat}
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: 
Relative path in absolute URI: regionLocations_myTableNamespace:myTableName
    at org.apache.hadoop.fs.Path.initialize(Path.java:206)
    at org.apache.hadoop.fs.Path.<init>(Path.java:172)
    at org.apache.hadoop.fs.Path.<init>(Path.java:94)
    at 
org.apache.crunch.io.hbase.HFileUtils.writeToHFilesForIncrementalLoad(HFileUtils.java:517)
    at 
org.apache.crunch.io.hbase.HFileUtils.writePutsToHFilesForIncrementalLoad(HFileUtils.java:608)
    at 
org.apache.crunch.io.hbase.HFileUtils.writePutsToHFilesForIncrementalLoad(HFileUtils.java:578)
    at 
org.apache.crunch.io.hbase.HFileUtils.writePutsToHFilesForIncrementalLoad(HFileUtils.java:542)
    ... 
 {noformat}

The ":" delimiter in the qualified table name isn't a valid path element
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/model.html#Paths_and_Path_Elements

Opened CRUNCH-688 to fix this.



> Set HDFS node affinity on created HFiles to improve locality
> ------------------------------------------------------------
>
>                 Key: CRUNCH-644
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-644
>             Project: Crunch
>          Issue Type: Improvement
>            Reporter: Gabriel Reid
>            Assignee: Gabriel Reid
>            Priority: Major
>             Fix For: 1.0.0
>
>         Attachments: CRUNCH-644.patch
>
>
> When creating HFiles via the {{HFileUtils.writeToHFilesForIncrementalLoad}} 
> method, the underlying HDFS blocks of the created HFiles will end up on a 
> selection of HDFS data nodes -- the selection of which nodes is left up to 
> the HDFS Namenode. This means that there is a relatively small chance 
> (depending on cluster size and replication factor) that the created HFiles 
> will end up on the same physical machine as the region server which will make 
> use of these HFiles, which limits the ability to use short-circuit reads to 
> the local file system. Typically, this lack of locality is only really 
> completely resolved after a major compaction.
> It's possible to set a node affinity on HDFS files at creation time, to 
> provide a suggestion to the namenode about a preferred data node for blocks 
> to be located on. The intention of this ticket is to make use of this 
> functionality to set the node affinity during HFile creation in 
> {{HFileUtils.writeToHFilesForIncrementalLoad}} so that at least one (HDFS) 
> block of each created HFile will be located on the same physical machine as 
> the region server which will be using the file (assuming HDFS data nodes are 
> running on the same machines as HBase region servers).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to