[ https://issues.apache.org/jira/browse/HDFS-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yufei Gu updated HDFS-8828: --------------------------- Component/s: snapshots distcp > Utilize Snapshot diff report to build copy list in distcp > --------------------------------------------------------- > > Key: HDFS-8828 > URL: https://issues.apache.org/jira/browse/HDFS-8828 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp, snapshots > Reporter: Yufei Gu > Assignee: Yufei Gu > > Some users reported huge time cost to build file copy list in distcp. (30 > hours with 1.6M files). We can leverage snapshot diff report to build file > copy list including files/dirs which are changes only between two snapshots > (or a snapshot and a normal dir). It speed up the process in two folds: 1. > less copy list building time. 2. less file copy MR jobs. > HDFS snapshot diff report provide information about file/directory creation, > deletion, rename and modification between two snapshots or a snapshot and a > normal directory. HDFS-7535 synchronize deletion and rename, then fallback to > the default distcp. So it still relies on default distcp to building copy > list which will traverse all files under the source dir. This patch will > build the copy list based on snapshot diff report. -- This message was sent by Atlassian JIRA (v6.3.4#6332)