Andrew Olson created HADOOP-16147:
-------------------------------------
Summary: Allow CopyListing sequence file keys and values to be
more easily customized
Key: HADOOP-16147
URL: https://issues.apache.org/jira/browse/HADOOP-16147
Project: Hadoop Common
Issue Type: Improvement
Components: tools/distcp
Reporter: Andrew Olson
We have encountered a scenario where, when using the Crunch library to run a
distributed copy (CRUNCH-660, CRUNCH-675) at the conclusion of a job we need to
dynamically rename target paths to the preferred destination output part file
names, rather than retaining the original source path names.
A custom CopyListing implementation appears to be the proper solution for this.
However the place where the current SimpleCopyListing logic needs to be
adjusted is in a private method (writeToFileListing), so a relatively large
portion of the class would need to be cloned.
To minimize the amount of code duplication required for such a custom
implementation, we propose adding two new protected methods to the CopyListing
class, that can be used to change the actual keys and/or values written to the
copy listing sequence file:
{noformat}
protected Text getFileListingKey(Path sourcePathRoot, CopyListingFileStatus
fileStatus);
protected CopyListingFileStatus getFileListingValue(CopyListingFileStatus
fileStatus);
{noformat}
The SimpleCopyListing class would then be modified to consume these methods as
follows,
{noformat}
fileListWriter.append(
getFileListingKey(sourcePathRoot, fileStatus),
getFileListingValue(fileStatus));
{noformat}
The default implementations would simply preserve the present behavior of the
SimpleCopyListing class, and could reside in either CopyListing or
SimpleCopyListing, whichever is preferable.
{noformat}
protected Text getFileListingKey(Path sourcePathRoot, CopyListingFileStatus
fileStatus) {
return new Text(DistCpUtils.getRelativePath(sourcePathRoot,
fileStatus.getPath()));
}
protected CopyListingFileStatus getFileListingValue(CopyListingFileStatus
fileStatus) {
return fileStatus;
}
{noformat}
Please let me know if this proposal seems to be on the right track. If so I can
provide a patch.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]