[ https://issues.apache.org/jira/browse/HADOOP-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16798584#comment-16798584 ]
Steve Loughran commented on HADOOP-16147: ----------------------------------------- makes sense. Are you really sure you couldn't come up with a test? > Allow CopyListing sequence file keys and values to be more easily customized > ---------------------------------------------------------------------------- > > Key: HADOOP-16147 > URL: https://issues.apache.org/jira/browse/HADOOP-16147 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp > Reporter: Andrew Olson > Assignee: Andrew Olson > Priority: Major > Attachments: HADOOP-16147-001.patch, HADOOP-16147-002.patch > > > We have encountered a scenario where, when using the Crunch library to run a > distributed copy (CRUNCH-660, CRUNCH-675) at the conclusion of a job we need > to dynamically rename target paths to the preferred destination output part > file names, rather than retaining the original source path names. > A custom CopyListing implementation appears to be the proper solution for > this. However the place where the current SimpleCopyListing logic needs to be > adjusted is in a private method (writeToFileListing), so a relatively large > portion of the class would need to be cloned. > To minimize the amount of code duplication required for such a custom > implementation, we propose adding two new protected methods to the > CopyListing class, that can be used to change the actual keys and/or values > written to the copy listing sequence file: > {noformat} > protected Text getFileListingKey(Path sourcePathRoot, CopyListingFileStatus > fileStatus); > protected CopyListingFileStatus getFileListingValue(CopyListingFileStatus > fileStatus); > {noformat} > The SimpleCopyListing class would then be modified to consume these methods > as follows, > {noformat} > fileListWriter.append( > getFileListingKey(sourcePathRoot, fileStatus), > getFileListingValue(fileStatus)); > {noformat} > The default implementations would simply preserve the present behavior of the > SimpleCopyListing class, and could reside in either CopyListing or > SimpleCopyListing, whichever is preferable. > {noformat} > protected Text getFileListingKey(Path sourcePathRoot, CopyListingFileStatus > fileStatus) { > return new Text(DistCpUtils.getRelativePath(sourcePathRoot, > fileStatus.getPath())); > } > protected CopyListingFileStatus getFileListingValue(CopyListingFileStatus > fileStatus) { > return fileStatus; > } > {noformat} > Please let me know if this proposal seems to be on the right track. If so I > can provide a patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org