HDFS-9048. DistCp documentation is out-of-dated (Daisuke Kobayashi via iwasakims)
Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/33a412e8 Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/33a412e8 Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/33a412e8 Branch: refs/heads/yarn-2877 Commit: 33a412e8a4ab729d588a9576fb7eb90239c6e383 Parents: eb864d3 Author: Masatake Iwasaki <iwasak...@apache.org> Authored: Thu Mar 3 18:57:23 2016 +0900 Committer: Masatake Iwasaki <iwasak...@apache.org> Committed: Thu Mar 3 18:57:23 2016 +0900 ---------------------------------------------------------------------- hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt | 3 +++ .../hadoop-distcp/src/site/markdown/DistCp.md.vm | 13 +++++++------ 2 files changed, 10 insertions(+), 6 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/hadoop/blob/33a412e8/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt ---------------------------------------------------------------------- diff --git a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt index d922cd7..0ddecaf 100644 --- a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt +++ b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt @@ -2916,6 +2916,9 @@ Release 2.7.3 - UNRELEASED HDFS-8791. block ID-based DN storage layout can be very slow for datanode on ext4 (Chris Trezzo via kihwal) + HDFS-9048. DistCp documentation is out-of-dated + (Daisuke Kobayashi via iwasakims) + OPTIMIZATIONS BUG FIXES http://git-wip-us.apache.org/repos/asf/hadoop/blob/33a412e8/hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm ---------------------------------------------------------------------- diff --git a/hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm b/hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm index 8f64ea2..bac5ecc 100644 --- a/hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm +++ b/hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm @@ -412,12 +412,13 @@ $H3 Map sizing $H3 Copying Between Versions of HDFS - For copying between two different versions of Hadoop, one will usually use - HftpFileSystem. This is a read-only FileSystem, so DistCp must be run on the - destination cluster (more specifically, on NodeManagers that can write to the - destination cluster). Each source is specified as - `hftp://<dfs.http.address>/<path>` (the default `dfs.http.address` is - `<namenode>:50070`). + For copying between two different major versions of Hadoop (e.g. between 1.X + and 2.X), one will usually use WebHdfsFileSystem. Unlike the previous + HftpFileSystem, as webhdfs is available for both read and write operations, + DistCp can be run on both source and destination cluster. + Remote cluster is specified as `webhdfs://<namenode_hostname>:<http_port>`. + When copying between same major versions of Hadoop cluster (e.g. between 2.X + and 2.X), use hdfs protocol for better performance. $H3 MapReduce and other side-effects