[ 
https://issues.apache.org/jira/browse/HADOOP-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16798906#comment-16798906
 ] 

Hudson commented on HADOOP-16147:
---------------------------------

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16261 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16261/])
HADOOP-16147. Allow CopyListing sequence file keys and values to be more 
(stevel: rev faba3591d32f2e4808c2faeb9472348d52619c8a)
* (edit) 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
* (edit) 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java


> Allow CopyListing sequence file keys and values to be more easily customized
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-16147
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16147
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: tools/distcp
>            Reporter: Andrew Olson
>            Assignee: Andrew Olson
>            Priority: Major
>             Fix For: 3.2.1
>
>         Attachments: HADOOP-16147-001.patch, HADOOP-16147-002.patch
>
>
> We have encountered a scenario where, when using the Crunch library to run a 
> distributed copy (CRUNCH-660, CRUNCH-675) at the conclusion of a job we need 
> to dynamically rename target paths to the preferred destination output part 
> file names, rather than retaining the original source path names.
> A custom CopyListing implementation appears to be the proper solution for 
> this. However the place where the current SimpleCopyListing logic needs to be 
> adjusted is in a private method (writeToFileListing), so a relatively large 
> portion of the class would need to be cloned.
> To minimize the amount of code duplication required for such a custom 
> implementation, we propose adding two new protected methods to the 
> CopyListing class, that can be used to change the actual keys and/or values 
> written to the copy listing sequence file: 
> {noformat}
> protected Text getFileListingKey(Path sourcePathRoot, CopyListingFileStatus 
> fileStatus);
> protected CopyListingFileStatus getFileListingValue(CopyListingFileStatus 
> fileStatus);
> {noformat}
> The SimpleCopyListing class would then be modified to consume these methods 
> as follows,
> {noformat}
> fileListWriter.append(
>    getFileListingKey(sourcePathRoot, fileStatus),
>    getFileListingValue(fileStatus));
> {noformat}
> The default implementations would simply preserve the present behavior of the 
> SimpleCopyListing class, and could reside in either CopyListing or 
> SimpleCopyListing, whichever is preferable.
> {noformat}
> protected Text getFileListingKey(Path sourcePathRoot, CopyListingFileStatus 
> fileStatus) {
>    return new Text(DistCpUtils.getRelativePath(sourcePathRoot, 
> fileStatus.getPath()));
> }
> protected CopyListingFileStatus getFileListingValue(CopyListingFileStatus 
> fileStatus) {
>    return fileStatus;
> }
> {noformat}
> Please let me know if this proposal seems to be on the right track. If so I 
> can provide a patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to