[ https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707441#comment-14707441 ]
Hudson commented on HDFS-8828: ------------------------------ FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #294 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/294/]) HDFS-8828. Utilize Snapshot diff report to build diff copy list in distcp. (Yufei Gu via Yongjun Zhang) (yzhang: rev 0bc15cb6e60dc60885234e01dec1c7cb4557a926) * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java * hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java * hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java > Utilize Snapshot diff report to build diff copy list in distcp > -------------------------------------------------------------- > > Key: HDFS-8828 > URL: https://issues.apache.org/jira/browse/HDFS-8828 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp, snapshots > Reporter: Yufei Gu > Assignee: Yufei Gu > Fix For: 2.8.0 > > Attachments: HDFS-8828.001.patch, HDFS-8828.002.patch, > HDFS-8828.003.patch, HDFS-8828.004.patch, HDFS-8828.005.patch, > HDFS-8828.006.patch, HDFS-8828.007.patch, HDFS-8828.008.patch, > HDFS-8828.009.patch, HDFS-8828.010.patch, HDFS-8828.011.patch > > > Some users reported huge time cost to build file copy list in distcp. (30 > hours for 1.6M files). We can leverage snapshot diff report to build file > copy list including files/dirs which are changes only between two snapshots > (or a snapshot and a normal dir). It speed up the process in two folds: 1. > less copy list building time. 2. less file copy MR jobs. > HDFS snapshot diff report provide information about file/directory creation, > deletion, rename and modification between two snapshots or a snapshot and a > normal directory. HDFS-7535 synchronize deletion and rename, then fallback to > the default distcp. So it still relies on default distcp to building complete > list of files under the source dir. This patch only puts creation and > modification files into the copy list based on snapshot diff report. We can > minimize the number of files to copy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)