[ 
https://issues.apache.org/jira/browse/HDFS-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208914#comment-14208914
 ] 

Yongjun Zhang commented on HDFS-7312:
-------------------------------------

Hi [~jprosser],

Thanks for the nice work here. I looked through, it looks good, except for a 
couple of improvements and nits:

Improvements:

* For the following new test created, to achieve better code sharing
{code}
 /** copy files from dfs file system to dfs file system with skiptmp */
  public void testCopyFromDfsToDfsWithSkiptmp() throws Exception {
{code}
suggest to create a new private method with an additional boolean parameter 
skipTmp, {{private void testCopyFromDfsToDfs(final boolean skipTmp}}. Then you 
can make the pre-existing test testCopyFromDfsToDfs and your new test to call 
this private method with false and true accordingly.

* This same above suggestion applies to {{testCopySingleFileWithSkiptmp()}}.

* Line 1221, do we really need to ceate the tmpDir here? For example, if we 
copy a single file to a destination file, why a tmpDir is needed? I think we 
can avoid doing this (removing that block of code), and have a file-existence 
checking when doing the fullyDelete. The point is, the TMP_DIR_LABEL may not 
always be created, we don't have to create it just for the purpose of deleting 
it. Right?

Nits:

* About usage description "Copy files directly to the final destination.", 
maybe we can change it to "Instead, copy files directly to the final 
destination." ?
* Line 394, "// filename and the path becomes its parent directory.", replace 
"path" with "destPath"
* Line 1218, rename "tmpDirRoot" to "tmpDirPrefix"?
* Line 396, tab key not replaced with spaces. 
* line 400  remove newly added extra newline
* Line 956, replace "The job to configure" with "The job configuration"
* Line 1218, line is too long, need to be <= 80 columns based on hadoop code 
guideline;

BTW, I guess you have tried to test it out in real clusters, right?

Thanks a lot.


> Update DistCp v1 to optionally not use tmp location
> ---------------------------------------------------
>
>                 Key: HDFS-7312
>                 URL: https://issues.apache.org/jira/browse/HDFS-7312
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: tools
>    Affects Versions: 2.5.1
>            Reporter: Joseph Prosser
>            Assignee: Joseph Prosser
>            Priority: Minor
>         Attachments: HDFS-7312.001.patch, HDFS-7312.002.patch, 
> HDFS-7312.003.patch, HDFS-7312.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> DistCp v1 currently copies files to a tmp location and then renames that to 
> the specified destination.  This can cause performance issues on filesystems 
> such as S3.  A -skiptmp flag will be added to bypass this step and copy 
> directly to the destination.  This feature mirrors a similar one added to 
> HBase ExportSnapshot 
> [HBASE-11119|https://issues.apache.org/jira/browse/HBASE-11119]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to