[ https://issues.apache.org/jira/browse/HDFS-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15539978#comment-15539978 ]
Hadoop QA commented on HDFS-9820: --------------------------------- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 12s{color} | {color:orange} hadoop-tools/hadoop-distcp: The patch generated 44 new + 172 unchanged - 12 fixed = 216 total (was 184) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 55s{color} | {color:green} hadoop-distcp in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 22m 39s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Issue | HDFS-9820 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12831237/HDFS-9820.005.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 113c4e3bd32b 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / fe9ebe2 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/16969/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16969/testReport/ | | modules | C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16969/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Improve distcp to support efficient restore to an earlier snapshot > ------------------------------------------------------------------ > > Key: HDFS-9820 > URL: https://issues.apache.org/jira/browse/HDFS-9820 > Project: Hadoop HDFS > Issue Type: New Feature > Components: distcp > Reporter: Yongjun Zhang > Assignee: Yongjun Zhang > Attachments: HDFS-9820.001.patch, HDFS-9820.002.patch, > HDFS-9820.003.patch, HDFS-9820.004.patch, HDFS-9820.005.patch > > > A common use scenario (scenaio 1): > # create snapshot sx in clusterX, > # do some experiemnts in clusterX, which creates some files. > # throw away the files changed and go back to sx. > Another scenario (scenario 2) is, there is a production cluster and a backup > cluster, we periodically sync up the data from production cluster to the > backup cluster with distcp. > The cluster in scenario 1 could be the backup cluster in scenario 2. > For scenario 1: > HDFS-4167 intends to restore HDFS to the most recent snapshot, and there are > some complexity and challenges. Before that jira is implemented, we count on > distcp to copy from snapshot to the current state. However, the performance > of this operation could be very bad because we have to go through all files > even if we only changed a few files. > For scenario 2: > HDFS-7535 improved distcp performance by avoiding copying files that changed > name since last backup. > On top of HDFS-7535, HDFS-8828 improved distcp performance when copying data > from source to target cluster, by only copying changed files since last > backup. The way it works is use snapshot diff to find out all files changed, > and copy the changed files only. > See > https://blog.cloudera.com/blog/2015/12/distcp-performance-improvements-in-apache-hadoop/ > This jira is to propose a variation of HDFS-8828, to find out the files > changed in target cluster since last snapshot sx, and copy these from > snapshot sx of either the source or the target cluster, to restore target > cluster's current state to sx. > Specifically, > If a file/dir is > - renamed, rename it back > - created in target cluster, delete it > - modified, put it to the copy list > - run distcp with the copy list, copy from the source cluster's corresponding > snapshot > This could be a new command line switch -rdiff in distcp. > As a native restore feature, HDFS-4167 would still be ideal to have. However, > HDFS-9820 would hopefully be easier to implement, before HDFS-4167 is in > place. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org