steveloughran commented on a change in pull request #1794: HADOOP-15887: Add an 
option to avoid writing data locally in Distcp
URL: https://github.com/apache/hadoop/pull/1794#discussion_r363378165
 
 

 ##########
 File path: hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm
 ##########
 @@ -362,6 +362,7 @@ Command Line Options
 | `-copybuffersize <copybuffersize>` | Size of the copy buffer to use. By 
default, `<copybuffersize>` is set to 8192B | |
 | `-xtrack <path>` | Save information about missing source files to the 
specified path. | This option is only valid with `-update` option. This is an 
experimental property and it cannot be used with `-atomic` option. |
 | `-direct` | Write directly to destination paths | Useful for avoiding 
potentially very expensive temporary file rename operations when the 
destination is an object store |
+| `-noLocalWrite` | Write data to target cluster with data locality disabled. 
| If this option is set, the distcp task will not write data replication to 
local datanode to avoid datanode being imbalanced. This option is suggested to 
be specified when the data to copy is very large and the DistCp job runs on the 
target cluster. |
 
 Review comment:
   suggest:
   
   Write data to an HDFS cluster with data locality disabled. | If this option 
is set, the distcp tasks will not write data blocks to their local datanodes, 
so avoiding datanodes becoming imbalanced. Recommended when the amount of data 
to copy is very large, the target cluster is HDFS and the DistCp job runs on 
that target cluster. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to