[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748198#comment-17748198 ] ASF GitHub Bot commented on HDFS-17120: --- umamaheswararao commented on PR #5885: URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1653794909 I agree, hasPathCapability is a good idea. scheme based dependencies have problems with viewfs like setups. otherwise anyway It may need to depend on resolvePaths to make it work correctly for viewfs I believe. > Support snapshot diff based copylisting for flat paths. > --- > > Key: HDFS-17120 > URL: https://issues.apache.org/jira/browse/HDFS-17120 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sadanand Shenoy >Assignee: Sadanand Shenoy >Priority: Major > Labels: pull-request-available > > Currently for Diff-based copyListing that is used during the distcpSync step > of an incremental copy by default the SimpleCopyListing implementation is > used. In it's implementation it iterates through the DiffReport and if the > DiffType is Create and the path is a directory, it recursively traverses the > directory and adds the subpaths to the resultant copyList. > This works fine for implementations of snapshotDiff that include only > top-level directories as part of its DiffReport . Suppose a snapshotDiff > implementation outputs only flat paths that include both the directory and > sub-directory subpath in its DiffReport, it will lead to duplicate paths in > the copyList and throws DuplicateFileException. > > For example > Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all > subpaths as part of the diff. > {code:java} > [~]# ozone sh snapshot create vol11/buck1 snap1 > [~]# ozone sh snapshot create vol11/buck2 snap1 > [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111 > [~]# ozone sh snapshot create vol11/buck1 snap2 > [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2 > Difference between snapshot: snap1 and snapshot: snap2 > + ./dir1 > + ./dir1/dir11 > + ./dir1/dir11/dir111 {code} > we can see even though dir11 & dir111 are subpaths they are present in > snapdiff , This is not the case for HDFS though. > This Jira aims to create a new copyListing impl that is used for diff based > copyListing that doesn't traverse the directory but only adds paths that are > present in its diff. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748172#comment-17748172 ] ASF GitHub Bot commented on HDFS-17120: --- steveloughran commented on PR #5885: URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1653673632 the way to be adaptive for FS is to use hasPathCapability() on the FS and so ask it what it does, rather than hard code various filesystem schemas. > Support snapshot diff based copylisting for flat paths. > --- > > Key: HDFS-17120 > URL: https://issues.apache.org/jira/browse/HDFS-17120 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sadanand Shenoy >Assignee: Sadanand Shenoy >Priority: Major > Labels: pull-request-available > > Currently for Diff-based copyListing that is used during the distcpSync step > of an incremental copy by default the SimpleCopyListing implementation is > used. In it's implementation it iterates through the DiffReport and if the > DiffType is Create and the path is a directory, it recursively traverses the > directory and adds the subpaths to the resultant copyList. > This works fine for implementations of snapshotDiff that include only > top-level directories as part of its DiffReport . Suppose a snapshotDiff > implementation outputs only flat paths that include both the directory and > sub-directory subpath in its DiffReport, it will lead to duplicate paths in > the copyList and throws DuplicateFileException. > > For example > Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all > subpaths as part of the diff. > {code:java} > [~]# ozone sh snapshot create vol11/buck1 snap1 > [~]# ozone sh snapshot create vol11/buck2 snap1 > [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111 > [~]# ozone sh snapshot create vol11/buck1 snap2 > [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2 > Difference between snapshot: snap1 and snapshot: snap2 > + ./dir1 > + ./dir1/dir11 > + ./dir1/dir11/dir111 {code} > we can see even though dir11 & dir111 are subpaths they are present in > snapdiff , This is not the case for HDFS though. > This Jira aims to create a new copyListing impl that is used for diff based > copyListing that doesn't traverse the directory but only adds paths that are > present in its diff. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747828#comment-17747828 ] ASF GitHub Bot commented on HDFS-17120: --- sadanand48 commented on PR #5885: URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1653093330 Filed HDFS-17131 for the same. Thanks @umamaheswararao , @ayushtkn , @swamirishi . > Support snapshot diff based copylisting for flat paths. > --- > > Key: HDFS-17120 > URL: https://issues.apache.org/jira/browse/HDFS-17120 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sadanand Shenoy >Assignee: Sadanand Shenoy >Priority: Major > Labels: pull-request-available > > Currently for Diff-based copyListing that is used during the distcpSync step > of an incremental copy by default the SimpleCopyListing implementation is > used. In it's implementation it iterates through the DiffReport and if the > DiffType is Create and the path is a directory, it recursively traverses the > directory and adds the subpaths to the resultant copyList. > This works fine for implementations of snapshotDiff that include only > top-level directories as part of its DiffReport . Suppose a snapshotDiff > implementation outputs only flat paths that include both the directory and > sub-directory subpath in its DiffReport, it will lead to duplicate paths in > the copyList and throws DuplicateFileException. > > For example > Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all > subpaths as part of the diff. > {code:java} > [~]# ozone sh snapshot create vol11/buck1 snap1 > [~]# ozone sh snapshot create vol11/buck2 snap1 > [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111 > [~]# ozone sh snapshot create vol11/buck1 snap2 > [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2 > Difference between snapshot: snap1 and snapshot: snap2 > + ./dir1 > + ./dir1/dir11 > + ./dir1/dir11/dir111 {code} > we can see even though dir11 & dir111 are subpaths they are present in > snapdiff , This is not the case for HDFS though. > This Jira aims to create a new copyListing impl that is used for diff based > copyListing that doesn't traverse the directory but only adds paths that are > present in its diff. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747825#comment-17747825 ] ASF GitHub Bot commented on HDFS-17120: --- umamaheswararao commented on PR #5885: URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1653089202 Thanks @sadanand48 for the contribution. Thanks @swamirishi and @ayushtkn for the reviews. > Support snapshot diff based copylisting for flat paths. > --- > > Key: HDFS-17120 > URL: https://issues.apache.org/jira/browse/HDFS-17120 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sadanand Shenoy >Assignee: Sadanand Shenoy >Priority: Major > Labels: pull-request-available > > Currently for Diff-based copyListing that is used during the distcpSync step > of an incremental copy by default the SimpleCopyListing implementation is > used. In it's implementation it iterates through the DiffReport and if the > DiffType is Create and the path is a directory, it recursively traverses the > directory and adds the subpaths to the resultant copyList. > This works fine for implementations of snapshotDiff that include only > top-level directories as part of its DiffReport . Suppose a snapshotDiff > implementation outputs only flat paths that include both the directory and > sub-directory subpath in its DiffReport, it will lead to duplicate paths in > the copyList and throws DuplicateFileException. > > For example > Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all > subpaths as part of the diff. > {code:java} > [~]# ozone sh snapshot create vol11/buck1 snap1 > [~]# ozone sh snapshot create vol11/buck2 snap1 > [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111 > [~]# ozone sh snapshot create vol11/buck1 snap2 > [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2 > Difference between snapshot: snap1 and snapshot: snap2 > + ./dir1 > + ./dir1/dir11 > + ./dir1/dir11/dir111 {code} > we can see even though dir11 & dir111 are subpaths they are present in > snapdiff , This is not the case for HDFS though. > This Jira aims to create a new copyListing impl that is used for diff based > copyListing that doesn't traverse the directory but only adds paths that are > present in its diff. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747823#comment-17747823 ] ASF GitHub Bot commented on HDFS-17120: --- umamaheswararao merged PR #5885: URL: https://github.com/apache/hadoop/pull/5885 > Support snapshot diff based copylisting for flat paths. > --- > > Key: HDFS-17120 > URL: https://issues.apache.org/jira/browse/HDFS-17120 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sadanand Shenoy >Assignee: Sadanand Shenoy >Priority: Major > Labels: pull-request-available > > Currently for Diff-based copyListing that is used during the distcpSync step > of an incremental copy by default the SimpleCopyListing implementation is > used. In it's implementation it iterates through the DiffReport and if the > DiffType is Create and the path is a directory, it recursively traverses the > directory and adds the subpaths to the resultant copyList. > This works fine for implementations of snapshotDiff that include only > top-level directories as part of its DiffReport . Suppose a snapshotDiff > implementation outputs only flat paths that include both the directory and > sub-directory subpath in its DiffReport, it will lead to duplicate paths in > the copyList and throws DuplicateFileException. > > For example > Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all > subpaths as part of the diff. > {code:java} > [~]# ozone sh snapshot create vol11/buck1 snap1 > [~]# ozone sh snapshot create vol11/buck2 snap1 > [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111 > [~]# ozone sh snapshot create vol11/buck1 snap2 > [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2 > Difference between snapshot: snap1 and snapshot: snap2 > + ./dir1 > + ./dir1/dir11 > + ./dir1/dir11/dir111 {code} > we can see even though dir11 & dir111 are subpaths they are present in > snapdiff , This is not the case for HDFS though. > This Jira aims to create a new copyListing impl that is used for diff based > copyListing that doesn't traverse the directory but only adds paths that are > present in its diff. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747822#comment-17747822 ] ASF GitHub Bot commented on HDFS-17120: --- umamaheswararao commented on PR #5885: URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1653086998 @sadanand48 could you please file the followup JIRAs as you thinking? > Support snapshot diff based copylisting for flat paths. > --- > > Key: HDFS-17120 > URL: https://issues.apache.org/jira/browse/HDFS-17120 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sadanand Shenoy >Assignee: Sadanand Shenoy >Priority: Major > Labels: pull-request-available > > Currently for Diff-based copyListing that is used during the distcpSync step > of an incremental copy by default the SimpleCopyListing implementation is > used. In it's implementation it iterates through the DiffReport and if the > DiffType is Create and the path is a directory, it recursively traverses the > directory and adds the subpaths to the resultant copyList. > This works fine for implementations of snapshotDiff that include only > top-level directories as part of its DiffReport . Suppose a snapshotDiff > implementation outputs only flat paths that include both the directory and > sub-directory subpath in its DiffReport, it will lead to duplicate paths in > the copyList and throws DuplicateFileException. > > For example > Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all > subpaths as part of the diff. > {code:java} > [~]# ozone sh snapshot create vol11/buck1 snap1 > [~]# ozone sh snapshot create vol11/buck2 snap1 > [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111 > [~]# ozone sh snapshot create vol11/buck1 snap2 > [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2 > Difference between snapshot: snap1 and snapshot: snap2 > + ./dir1 > + ./dir1/dir11 > + ./dir1/dir11/dir111 {code} > we can see even though dir11 & dir111 are subpaths they are present in > snapdiff , This is not the case for HDFS though. > This Jira aims to create a new copyListing impl that is used for diff based > copyListing that doesn't traverse the directory but only adds paths that are > present in its diff. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747808#comment-17747808 ] ASF GitHub Bot commented on HDFS-17120: --- sadanand48 commented on PR #5885: URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1653039629 Thanks @ayushtkn for taking a look. 1. If the config (`traverseDirectories`) is set to false in case of HDFS, it will cause `DuplicateFileException` and fail the distcp operation. Hence the default is set to true. 2. I agree , this is a good point and can solve the below problems you stated wrt to the same config object being used by both HDFS and Ozone in an application like Hive replication etc. I will raise a follow up patch/jira to solve this case and the config to be deduced from the fs scheme. 3. For `getTraverseExcludeList` : redundant copy case comes only when we recursively traverse and add paths not present in the diff. Here we are adding all paths from diff itself, so it wouldn't have any redundant path. > Support snapshot diff based copylisting for flat paths. > --- > > Key: HDFS-17120 > URL: https://issues.apache.org/jira/browse/HDFS-17120 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sadanand Shenoy >Assignee: Sadanand Shenoy >Priority: Major > Labels: pull-request-available > > Currently for Diff-based copyListing that is used during the distcpSync step > of an incremental copy by default the SimpleCopyListing implementation is > used. In it's implementation it iterates through the DiffReport and if the > DiffType is Create and the path is a directory, it recursively traverses the > directory and adds the subpaths to the resultant copyList. > This works fine for implementations of snapshotDiff that include only > top-level directories as part of its DiffReport . Suppose a snapshotDiff > implementation outputs only flat paths that include both the directory and > sub-directory subpath in its DiffReport, it will lead to duplicate paths in > the copyList and throws DuplicateFileException. > > For example > Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all > subpaths as part of the diff. > {code:java} > [~]# ozone sh snapshot create vol11/buck1 snap1 > [~]# ozone sh snapshot create vol11/buck2 snap1 > [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111 > [~]# ozone sh snapshot create vol11/buck1 snap2 > [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2 > Difference between snapshot: snap1 and snapshot: snap2 > + ./dir1 > + ./dir1/dir11 > + ./dir1/dir11/dir111 {code} > we can see even though dir11 & dir111 are subpaths they are present in > snapdiff , This is not the case for HDFS though. > This Jira aims to create a new copyListing impl that is used for diff based > copyListing that doesn't traverse the directory but only adds paths that are > present in its diff. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747602#comment-17747602 ] ASF GitHub Bot commented on HDFS-17120: --- umamaheswararao commented on PR #5885: URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1652194090 Latest changes looks good to me. Thanks @sadanand48 for working on this. > Support snapshot diff based copylisting for flat paths. > --- > > Key: HDFS-17120 > URL: https://issues.apache.org/jira/browse/HDFS-17120 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sadanand Shenoy >Assignee: Sadanand Shenoy >Priority: Major > Labels: pull-request-available > > Currently for Diff-based copyListing that is used during the distcpSync step > of an incremental copy by default the SimpleCopyListing implementation is > used. In it's implementation it iterates through the DiffReport and if the > DiffType is Create and the path is a directory, it recursively traverses the > directory and adds the subpaths to the resultant copyList. > This works fine for implementations of snapshotDiff that include only > top-level directories as part of its DiffReport . Suppose a snapshotDiff > implementation outputs only flat paths that include both the directory and > sub-directory subpath in its DiffReport, it will lead to duplicate paths in > the copyList and throws DuplicateFileException. > > For example > Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all > subpaths as part of the diff. > {code:java} > [~]# ozone sh snapshot create vol11/buck1 snap1 > [~]# ozone sh snapshot create vol11/buck2 snap1 > [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111 > [~]# ozone sh snapshot create vol11/buck1 snap2 > [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2 > Difference between snapshot: snap1 and snapshot: snap2 > + ./dir1 > + ./dir1/dir11 > + ./dir1/dir11/dir111 {code} > we can see even though dir11 & dir111 are subpaths they are present in > snapdiff , This is not the case for HDFS though. > This Jira aims to create a new copyListing impl that is used for diff based > copyListing that doesn't traverse the directory but only adds paths that are > present in its diff. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747591#comment-17747591 ] ASF GitHub Bot commented on HDFS-17120: --- hadoop-yetus commented on PR #5885: URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1652149377 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 38s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 34m 11s | | trunk passed | | +1 :green_heart: | compile | 0m 25s | | trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | compile | 0m 24s | | trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | checkstyle | 0m 24s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 27s | | trunk passed | | +1 :green_heart: | javadoc | 0m 28s | | trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javadoc | 0m 24s | | trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | spotbugs | 0m 42s | | trunk passed | | +1 :green_heart: | shadedclient | 21m 2s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 19s | | the patch passed | | +1 :green_heart: | compile | 0m 18s | | the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javac | 0m 18s | | the patch passed | | +1 :green_heart: | compile | 0m 16s | | the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | javac | 0m 16s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 13s | [/results-checkstyle-hadoop-tools_hadoop-distcp.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/9/artifact/out/results-checkstyle-hadoop-tools_hadoop-distcp.txt) | hadoop-tools/hadoop-distcp: The patch generated 1 new + 20 unchanged - 0 fixed = 21 total (was 20) | | +1 :green_heart: | mvnsite | 0m 18s | | the patch passed | | +1 :green_heart: | javadoc | 0m 17s | | hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 generated 0 new + 37 unchanged - 4 fixed = 37 total (was 41) | | +1 :green_heart: | javadoc | 0m 16s | | hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 generated 0 new + 37 unchanged - 4 fixed = 37 total (was 41) | | +1 :green_heart: | spotbugs | 0m 36s | | the patch passed | | +1 :green_heart: | shadedclient | 20m 48s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 13m 18s | | hadoop-distcp in the patch passed. | | +1 :green_heart: | asflicense | 0m 29s | | The patch does not generate ASF License warnings. | | | | 99m 36s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/9/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5885 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 395508a62168 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / bc3fd7fcf1233aef46acd48f7708a31c126787b8 | | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | Test Results |
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747560#comment-17747560 ] ASF GitHub Bot commented on HDFS-17120: --- sadanand48 commented on code in PR #5885: URL: https://github.com/apache/hadoop/pull/5885#discussion_r1275096942 ## hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java: ## @@ -167,6 +169,77 @@ public void testDuplicates() { } } + @Test(expected = DuplicateFileException.class, timeout = 1) + public void testDiffBasedSimpleCopyListing() throws IOException { +FileSystem fs = null; +Configuration configuration = getConf(); +DistCpSync distCpSync = Mockito.mock(DistCpSync.class); +Path listingFile = new Path("/tmp/list"); +// Throws DuplicateFileException as it recursively traverses src3 directory +// and also adds 3.txt,4.txt twice +configuration.setBoolean( +DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_TRAVERSE_DIRECTORY, true); +try { + fs = FileSystem.get(getConf()); + buildListingUsingSnapshotDiff(fs, configuration, distCpSync, listingFile); +} catch (IOException e) { + LOG.error("Exception encountered in test", e); Review Comment: done. ## hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java: ## @@ -167,6 +169,77 @@ public void testDuplicates() { } } + @Test(expected = DuplicateFileException.class, timeout = 1) + public void testDiffBasedSimpleCopyListing() throws IOException { +FileSystem fs = null; +Configuration configuration = getConf(); +DistCpSync distCpSync = Mockito.mock(DistCpSync.class); +Path listingFile = new Path("/tmp/list"); +// Throws DuplicateFileException as it recursively traverses src3 directory +// and also adds 3.txt,4.txt twice +configuration.setBoolean( +DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_TRAVERSE_DIRECTORY, true); +try { + fs = FileSystem.get(getConf()); + buildListingUsingSnapshotDiff(fs, configuration, distCpSync, listingFile); +} catch (IOException e) { + LOG.error("Exception encountered in test", e); + Assert.fail("Test failed " + e.getMessage()); +} finally { + TestDistCpUtils.delete(fs, "/tmp"); +} + } + + @Test(timeout=1) + public void testDiffBasedSimpleCopyListingWithoutTraverseDirectory() { +FileSystem fs = null; +Configuration configuration = getConf(); +DistCpSync distCpSync = Mockito.mock(DistCpSync.class); +Path listingFile = new Path("/tmp/list"); +// no exception expected in this case +configuration.setBoolean( +DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_TRAVERSE_DIRECTORY, false); +try { + fs = FileSystem.get(getConf()); + buildListingUsingSnapshotDiff(fs, configuration, distCpSync, listingFile); +} catch (IOException e) { + LOG.error("Exception encountered in test", e); Review Comment: done. > Support snapshot diff based copylisting for flat paths. > --- > > Key: HDFS-17120 > URL: https://issues.apache.org/jira/browse/HDFS-17120 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sadanand Shenoy >Assignee: Sadanand Shenoy >Priority: Major > Labels: pull-request-available > > Currently for Diff-based copyListing that is used during the distcpSync step > of an incremental copy by default the SimpleCopyListing implementation is > used. In it's implementation it iterates through the DiffReport and if the > DiffType is Create and the path is a directory, it recursively traverses the > directory and adds the subpaths to the resultant copyList. > This works fine for implementations of snapshotDiff that include only > top-level directories as part of its DiffReport . Suppose a snapshotDiff > implementation outputs only flat paths that include both the directory and > sub-directory subpath in its DiffReport, it will lead to duplicate paths in > the copyList and throws DuplicateFileException. > > For example > Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all > subpaths as part of the diff. > {code:java} > [~]# ozone sh snapshot create vol11/buck1 snap1 > [~]# ozone sh snapshot create vol11/buck2 snap1 > [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111 > [~]# ozone sh snapshot create vol11/buck1 snap2 > [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2 > Difference between snapshot: snap1 and snapshot: snap2 > + ./dir1 > + ./dir1/dir11 > + ./dir1/dir11/dir111 {code} > we can see even though dir11 & dir111 are subpaths they are present in > snapdiff , This is
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747532#comment-17747532 ] ASF GitHub Bot commented on HDFS-17120: --- steveloughran commented on code in PR #5885: URL: https://github.com/apache/hadoop/pull/5885#discussion_r1274999246 ## hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java: ## @@ -167,6 +169,77 @@ public void testDuplicates() { } } + @Test(expected = DuplicateFileException.class, timeout = 1) + public void testDiffBasedSimpleCopyListing() throws IOException { +FileSystem fs = null; +Configuration configuration = getConf(); +DistCpSync distCpSync = Mockito.mock(DistCpSync.class); +Path listingFile = new Path("/tmp/list"); +// Throws DuplicateFileException as it recursively traverses src3 directory +// and also adds 3.txt,4.txt twice +configuration.setBoolean( +DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_TRAVERSE_DIRECTORY, true); +try { + fs = FileSystem.get(getConf()); + buildListingUsingSnapshotDiff(fs, configuration, distCpSync, listingFile); +} catch (IOException e) { + LOG.error("Exception encountered in test", e); + Assert.fail("Test failed " + e.getMessage()); +} finally { + TestDistCpUtils.delete(fs, "/tmp"); +} + } + + @Test(timeout=1) + public void testDiffBasedSimpleCopyListingWithoutTraverseDirectory() { +FileSystem fs = null; +Configuration configuration = getConf(); +DistCpSync distCpSync = Mockito.mock(DistCpSync.class); +Path listingFile = new Path("/tmp/list"); +// no exception expected in this case +configuration.setBoolean( +DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_TRAVERSE_DIRECTORY, false); +try { + fs = FileSystem.get(getConf()); + buildListingUsingSnapshotDiff(fs, configuration, distCpSync, listingFile); +} catch (IOException e) { + LOG.error("Exception encountered in test", e); Review Comment: same. ## hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java: ## @@ -167,6 +169,77 @@ public void testDuplicates() { } } + @Test(expected = DuplicateFileException.class, timeout = 1) + public void testDiffBasedSimpleCopyListing() throws IOException { +FileSystem fs = null; +Configuration configuration = getConf(); +DistCpSync distCpSync = Mockito.mock(DistCpSync.class); +Path listingFile = new Path("/tmp/list"); +// Throws DuplicateFileException as it recursively traverses src3 directory +// and also adds 3.txt,4.txt twice +configuration.setBoolean( +DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_TRAVERSE_DIRECTORY, true); +try { + fs = FileSystem.get(getConf()); + buildListingUsingSnapshotDiff(fs, configuration, distCpSync, listingFile); +} catch (IOException e) { + LOG.error("Exception encountered in test", e); Review Comment: no need to catch; just throw > Support snapshot diff based copylisting for flat paths. > --- > > Key: HDFS-17120 > URL: https://issues.apache.org/jira/browse/HDFS-17120 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sadanand Shenoy >Assignee: Sadanand Shenoy >Priority: Major > Labels: pull-request-available > > Currently for Diff-based copyListing that is used during the distcpSync step > of an incremental copy by default the SimpleCopyListing implementation is > used. In it's implementation it iterates through the DiffReport and if the > DiffType is Create and the path is a directory, it recursively traverses the > directory and adds the subpaths to the resultant copyList. > This works fine for implementations of snapshotDiff that include only > top-level directories as part of its DiffReport . Suppose a snapshotDiff > implementation outputs only flat paths that include both the directory and > sub-directory subpath in its DiffReport, it will lead to duplicate paths in > the copyList and throws DuplicateFileException. > > For example > Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all > subpaths as part of the diff. > {code:java} > [~]# ozone sh snapshot create vol11/buck1 snap1 > [~]# ozone sh snapshot create vol11/buck2 snap1 > [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111 > [~]# ozone sh snapshot create vol11/buck1 snap2 > [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2 > Difference between snapshot: snap1 and snapshot: snap2 > + ./dir1 > + ./dir1/dir11 > + ./dir1/dir11/dir111 {code} > we can see even though dir11 & dir111 are subpaths they are
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747480#comment-17747480 ] ASF GitHub Bot commented on HDFS-17120: --- hadoop-yetus commented on PR #5885: URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1651686956 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 29s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 31m 35s | | trunk passed | | +1 :green_heart: | compile | 0m 26s | | trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | compile | 0m 24s | | trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | checkstyle | 0m 24s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 28s | | trunk passed | | +1 :green_heart: | javadoc | 0m 28s | | trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javadoc | 0m 24s | | trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | spotbugs | 0m 40s | | trunk passed | | +1 :green_heart: | shadedclient | 20m 39s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 19s | | the patch passed | | +1 :green_heart: | compile | 0m 18s | | the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javac | 0m 18s | | the patch passed | | +1 :green_heart: | compile | 0m 16s | | the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | javac | 0m 16s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 13s | [/results-checkstyle-hadoop-tools_hadoop-distcp.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/8/artifact/out/results-checkstyle-hadoop-tools_hadoop-distcp.txt) | hadoop-tools/hadoop-distcp: The patch generated 1 new + 20 unchanged - 0 fixed = 21 total (was 20) | | +1 :green_heart: | mvnsite | 0m 18s | | the patch passed | | +1 :green_heart: | javadoc | 0m 16s | | hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 generated 0 new + 37 unchanged - 4 fixed = 37 total (was 41) | | +1 :green_heart: | javadoc | 0m 16s | | hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 generated 0 new + 37 unchanged - 4 fixed = 37 total (was 41) | | +1 :green_heart: | spotbugs | 0m 37s | | the patch passed | | +1 :green_heart: | shadedclient | 20m 55s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 13m 20s | | hadoop-distcp in the patch passed. | | +1 :green_heart: | asflicense | 0m 29s | | The patch does not generate ASF License warnings. | | | | 96m 22s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/8/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5885 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux b944c168be7f 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 55d1196a354d68045d8d2452eb722671928a3d65 | | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | Test Results |
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747379#comment-17747379 ] ASF GitHub Bot commented on HDFS-17120: --- hadoop-yetus commented on PR #5885: URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1651219900 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 43s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 45m 47s | | trunk passed | | +1 :green_heart: | compile | 0m 33s | | trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | compile | 0m 34s | | trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | checkstyle | 0m 34s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 37s | | trunk passed | | +1 :green_heart: | javadoc | 0m 38s | | trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javadoc | 0m 33s | | trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | spotbugs | 0m 57s | | trunk passed | | +1 :green_heart: | shadedclient | 33m 55s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | -1 :x: | mvninstall | 0m 20s | [/patch-mvninstall-hadoop-tools_hadoop-distcp.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/7/artifact/out/patch-mvninstall-hadoop-tools_hadoop-distcp.txt) | hadoop-distcp in the patch failed. | | -1 :x: | compile | 0m 20s | [/patch-compile-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/7/artifact/out/patch-compile-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt) | hadoop-distcp in the patch failed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1. | | -1 :x: | javac | 0m 20s | [/patch-compile-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/7/artifact/out/patch-compile-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt) | hadoop-distcp in the patch failed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1. | | -1 :x: | compile | 0m 19s | [/patch-compile-hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/7/artifact/out/patch-compile-hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.txt) | hadoop-distcp in the patch failed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09. | | -1 :x: | javac | 0m 19s | [/patch-compile-hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/7/artifact/out/patch-compile-hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.txt) | hadoop-distcp in the patch failed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09. | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 19s | [/results-checkstyle-hadoop-tools_hadoop-distcp.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/7/artifact/out/results-checkstyle-hadoop-tools_hadoop-distcp.txt) | hadoop-tools/hadoop-distcp: The patch generated 1 new + 44 unchanged - 0 fixed = 45 total (was 44) | | -1 :x: | mvnsite | 0m 21s | [/patch-mvnsite-hadoop-tools_hadoop-distcp.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/7/artifact/out/patch-mvnsite-hadoop-tools_hadoop-distcp.txt) | hadoop-distcp in the patch failed. | | -1 :x: | javadoc | 0m 20s | [/patch-javadoc-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/7/artifact/out/patch-javadoc-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt) | hadoop-distcp in the
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747344#comment-17747344 ] ASF GitHub Bot commented on HDFS-17120: --- hadoop-yetus commented on PR #5885: URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1651156351 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 27s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 34m 59s | | trunk passed | | +1 :green_heart: | compile | 0m 25s | | trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | compile | 0m 24s | | trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | checkstyle | 0m 24s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 27s | | trunk passed | | +1 :green_heart: | javadoc | 0m 28s | | trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javadoc | 0m 24s | | trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | spotbugs | 0m 41s | | trunk passed | | +1 :green_heart: | shadedclient | 20m 37s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | -1 :x: | mvninstall | 0m 15s | [/patch-mvninstall-hadoop-tools_hadoop-distcp.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/6/artifact/out/patch-mvninstall-hadoop-tools_hadoop-distcp.txt) | hadoop-distcp in the patch failed. | | -1 :x: | compile | 0m 15s | [/patch-compile-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/6/artifact/out/patch-compile-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt) | hadoop-distcp in the patch failed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1. | | -1 :x: | javac | 0m 15s | [/patch-compile-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/6/artifact/out/patch-compile-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt) | hadoop-distcp in the patch failed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1. | | -1 :x: | compile | 0m 15s | [/patch-compile-hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/6/artifact/out/patch-compile-hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.txt) | hadoop-distcp in the patch failed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09. | | -1 :x: | javac | 0m 15s | [/patch-compile-hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/6/artifact/out/patch-compile-hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.txt) | hadoop-distcp in the patch failed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09. | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 13s | [/results-checkstyle-hadoop-tools_hadoop-distcp.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/6/artifact/out/results-checkstyle-hadoop-tools_hadoop-distcp.txt) | hadoop-tools/hadoop-distcp: The patch generated 1 new + 44 unchanged - 0 fixed = 45 total (was 44) | | -1 :x: | mvnsite | 0m 15s | [/patch-mvnsite-hadoop-tools_hadoop-distcp.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/6/artifact/out/patch-mvnsite-hadoop-tools_hadoop-distcp.txt) | hadoop-distcp in the patch failed. | | -1 :x: | javadoc | 0m 16s | [/patch-javadoc-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/6/artifact/out/patch-javadoc-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt) | hadoop-distcp in the
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747301#comment-17747301 ] ASF GitHub Bot commented on HDFS-17120: --- sadanand48 commented on code in PR #5885: URL: https://github.com/apache/hadoop/pull/5885#discussion_r1274432347 ## hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java: ## @@ -305,6 +305,33 @@ public static CopyListing getCopyListing(Configuration configuration, } } + /** + * Public Factory method with which the appropriate Diff CopyListing implementation may be retrieved. + * @param configuration The input configuration. + * @param credentials Credentials object on which the FS delegation tokens are cached + * @param distCpSync DistcpSync object used to sync diffs between source and target. + * @return An instance of the appropriate CopyListing implementation. + * @throws java.io.IOException - Exception if any + */ + public static CopyListing getDiffCopyListing(Configuration configuration, + Credentials credentials, DistCpSync distCpSync) throws IOException { +String copyListingClassName = Review Comment: Removed this now > Support snapshot diff based copylisting for flat paths. > --- > > Key: HDFS-17120 > URL: https://issues.apache.org/jira/browse/HDFS-17120 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sadanand Shenoy >Assignee: Sadanand Shenoy >Priority: Major > Labels: pull-request-available > > Currently for Diff-based copyListing that is used during the distcpSync step > of an incremental copy by default the SimpleCopyListing implementation is > used. In it's implementation it iterates through the DiffReport and if the > DiffType is Create and the path is a directory, it recursively traverses the > directory and adds the subpaths to the resultant copyList. > This works fine for implementations of snapshotDiff that include only > top-level directories as part of its DiffReport . Suppose a snapshotDiff > implementation outputs only flat paths that include both the directory and > sub-directory subpath in its DiffReport, it will lead to duplicate paths in > the copyList and throws DuplicateFileException. > > For example > Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all > subpaths as part of the diff. > {code:java} > [~]# ozone sh snapshot create vol11/buck1 snap1 > [~]# ozone sh snapshot create vol11/buck2 snap1 > [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111 > [~]# ozone sh snapshot create vol11/buck1 snap2 > [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2 > Difference between snapshot: snap1 and snapshot: snap2 > + ./dir1 > + ./dir1/dir11 > + ./dir1/dir11/dir111 {code} > we can see even though dir11 & dir111 are subpaths they are present in > snapdiff , This is not the case for HDFS though. > This Jira aims to create a new copyListing impl that is used for diff based > copyListing that doesn't traverse the directory but only adds paths that are > present in its diff. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747300#comment-17747300 ] ASF GitHub Bot commented on HDFS-17120: --- sadanand48 commented on PR #5885: URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1651055042 Thanks @umamaheswararao @swamirishi for the review, I have now updated the patch to use a flag instead of a new copyListing type. Please take a look. > Support snapshot diff based copylisting for flat paths. > --- > > Key: HDFS-17120 > URL: https://issues.apache.org/jira/browse/HDFS-17120 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sadanand Shenoy >Assignee: Sadanand Shenoy >Priority: Major > Labels: pull-request-available > > Currently for Diff-based copyListing that is used during the distcpSync step > of an incremental copy by default the SimpleCopyListing implementation is > used. In it's implementation it iterates through the DiffReport and if the > DiffType is Create and the path is a directory, it recursively traverses the > directory and adds the subpaths to the resultant copyList. > This works fine for implementations of snapshotDiff that include only > top-level directories as part of its DiffReport . Suppose a snapshotDiff > implementation outputs only flat paths that include both the directory and > sub-directory subpath in its DiffReport, it will lead to duplicate paths in > the copyList and throws DuplicateFileException. > > For example > Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all > subpaths as part of the diff. > {code:java} > [~]# ozone sh snapshot create vol11/buck1 snap1 > [~]# ozone sh snapshot create vol11/buck2 snap1 > [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111 > [~]# ozone sh snapshot create vol11/buck1 snap2 > [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2 > Difference between snapshot: snap1 and snapshot: snap2 > + ./dir1 > + ./dir1/dir11 > + ./dir1/dir11/dir111 {code} > we can see even though dir11 & dir111 are subpaths they are present in > snapdiff , This is not the case for HDFS though. > This Jira aims to create a new copyListing impl that is used for diff based > copyListing that doesn't traverse the directory but only adds paths that are > present in its diff. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747298#comment-17747298 ] ASF GitHub Bot commented on HDFS-17120: --- sadanand48 commented on code in PR #5885: URL: https://github.com/apache/hadoop/pull/5885#discussion_r1274430970 ## hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java: ## @@ -167,6 +169,77 @@ public void testDuplicates() { } } + @Test(expected = DuplicateFileException.class, timeout = 1) + public void testDiffBasedSimpleCopyListing() throws IOException { +FileSystem fs = null; +Configuration configuration = getConf(); +DistCpSync distCpSync = Mockito.mock(DistCpSync.class); +Path listingFile = new Path("/tmp/list"); +// Throws DuplicateFileException when copyListing is SimpleCopyListing +// as it recursively traverses src3 directory and also adds 3.txt,4.txt twice +configuration.set(DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_CLASS, +SimpleCopyListing.class.getName()); Review Comment: HDFS to Ozone -> traverseDirectory - true as hdfs diff contains only top level dirs Ozone to HDFS -> traverseDirectory - false as ozone diff contains all subpaths too > Support snapshot diff based copylisting for flat paths. > --- > > Key: HDFS-17120 > URL: https://issues.apache.org/jira/browse/HDFS-17120 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sadanand Shenoy >Assignee: Sadanand Shenoy >Priority: Major > Labels: pull-request-available > > Currently for Diff-based copyListing that is used during the distcpSync step > of an incremental copy by default the SimpleCopyListing implementation is > used. In it's implementation it iterates through the DiffReport and if the > DiffType is Create and the path is a directory, it recursively traverses the > directory and adds the subpaths to the resultant copyList. > This works fine for implementations of snapshotDiff that include only > top-level directories as part of its DiffReport . Suppose a snapshotDiff > implementation outputs only flat paths that include both the directory and > sub-directory subpath in its DiffReport, it will lead to duplicate paths in > the copyList and throws DuplicateFileException. > > For example > Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all > subpaths as part of the diff. > {code:java} > [~]# ozone sh snapshot create vol11/buck1 snap1 > [~]# ozone sh snapshot create vol11/buck2 snap1 > [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111 > [~]# ozone sh snapshot create vol11/buck1 snap2 > [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2 > Difference between snapshot: snap1 and snapshot: snap2 > + ./dir1 > + ./dir1/dir11 > + ./dir1/dir11/dir111 {code} > we can see even though dir11 & dir111 are subpaths they are present in > snapdiff , This is not the case for HDFS though. > This Jira aims to create a new copyListing impl that is used for diff based > copyListing that doesn't traverse the directory but only adds paths that are > present in its diff. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747297#comment-17747297 ] ASF GitHub Bot commented on HDFS-17120: --- sadanand48 commented on code in PR #5885: URL: https://github.com/apache/hadoop/pull/5885#discussion_r1274429488 ## hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java: ## @@ -316,6 +303,42 @@ protected void doBuildListingWithSnapshotDiff( } } + /** + * Handle create Diffs and add to the copyList. + * If the path is a directory, iterate it recursively and add the paths + * to the result copyList. + * + * @param fileListWriter the list for holding processed results + * @param context The DistCp context with associated input options + * @param sourceRoot The rootDir of the source snapshot + * @param sourceFS the source Filesystem + * @param fileStatuses store the result fileStatuses to add to the copyList + * @param diff the SnapshotDiff report + * @throws IOException + */ + protected void addCreateDiffsToFileListing(SequenceFile.Writer fileListWriter, + DistCpContext context, Path sourceRoot, FileSystem sourceFS, + List fileStatuses, DiffInfo diff) throws IOException { +addToFileListing(fileListWriter, sourceRoot, diff.getTarget(), context); + +FileStatus sourceStatus = sourceFS.getFileStatus(diff.getTarget()); Review Comment: I have now changed it to use a flag now. > Support snapshot diff based copylisting for flat paths. > --- > > Key: HDFS-17120 > URL: https://issues.apache.org/jira/browse/HDFS-17120 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sadanand Shenoy >Assignee: Sadanand Shenoy >Priority: Major > Labels: pull-request-available > > Currently for Diff-based copyListing that is used during the distcpSync step > of an incremental copy by default the SimpleCopyListing implementation is > used. In it's implementation it iterates through the DiffReport and if the > DiffType is Create and the path is a directory, it recursively traverses the > directory and adds the subpaths to the resultant copyList. > This works fine for implementations of snapshotDiff that include only > top-level directories as part of its DiffReport . Suppose a snapshotDiff > implementation outputs only flat paths that include both the directory and > sub-directory subpath in its DiffReport, it will lead to duplicate paths in > the copyList and throws DuplicateFileException. > > For example > Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all > subpaths as part of the diff. > {code:java} > [~]# ozone sh snapshot create vol11/buck1 snap1 > [~]# ozone sh snapshot create vol11/buck2 snap1 > [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111 > [~]# ozone sh snapshot create vol11/buck1 snap2 > [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2 > Difference between snapshot: snap1 and snapshot: snap2 > + ./dir1 > + ./dir1/dir11 > + ./dir1/dir11/dir111 {code} > we can see even though dir11 & dir111 are subpaths they are present in > snapdiff , This is not the case for HDFS though. > This Jira aims to create a new copyListing impl that is used for diff based > copyListing that doesn't traverse the directory but only adds paths that are > present in its diff. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747296#comment-17747296 ] ASF GitHub Bot commented on HDFS-17120: --- sadanand48 commented on code in PR #5885: URL: https://github.com/apache/hadoop/pull/5885#discussion_r1274428899 ## hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java: ## @@ -316,6 +303,42 @@ protected void doBuildListingWithSnapshotDiff( } } + /** + * Handle create Diffs and add to the copyList. + * If the path is a directory, iterate it recursively and add the paths + * to the result copyList. + * + * @param fileListWriter the list for holding processed results + * @param context The DistCp context with associated input options + * @param sourceRoot The rootDir of the source snapshot + * @param sourceFS the source Filesystem + * @param fileStatuses store the result fileStatuses to add to the copyList + * @param diff the SnapshotDiff report + * @throws IOException + */ + protected void addCreateDiffsToFileListing(SequenceFile.Writer fileListWriter, + DistCpContext context, Path sourceRoot, FileSystem sourceFS, + List fileStatuses, DiffInfo diff) throws IOException { +addToFileListing(fileListWriter, sourceRoot, diff.getTarget(), context); + +FileStatus sourceStatus = sourceFS.getFileStatus(diff.getTarget()); Review Comment: Yes, I didn't do so to change existing behaviour of SimpleCopyListing. I am good with using a flag here too. > Support snapshot diff based copylisting for flat paths. > --- > > Key: HDFS-17120 > URL: https://issues.apache.org/jira/browse/HDFS-17120 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sadanand Shenoy >Assignee: Sadanand Shenoy >Priority: Major > Labels: pull-request-available > > Currently for Diff-based copyListing that is used during the distcpSync step > of an incremental copy by default the SimpleCopyListing implementation is > used. In it's implementation it iterates through the DiffReport and if the > DiffType is Create and the path is a directory, it recursively traverses the > directory and adds the subpaths to the resultant copyList. > This works fine for implementations of snapshotDiff that include only > top-level directories as part of its DiffReport . Suppose a snapshotDiff > implementation outputs only flat paths that include both the directory and > sub-directory subpath in its DiffReport, it will lead to duplicate paths in > the copyList and throws DuplicateFileException. > > For example > Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all > subpaths as part of the diff. > {code:java} > [~]# ozone sh snapshot create vol11/buck1 snap1 > [~]# ozone sh snapshot create vol11/buck2 snap1 > [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111 > [~]# ozone sh snapshot create vol11/buck1 snap2 > [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2 > Difference between snapshot: snap1 and snapshot: snap2 > + ./dir1 > + ./dir1/dir11 > + ./dir1/dir11/dir111 {code} > we can see even though dir11 & dir111 are subpaths they are present in > snapdiff , This is not the case for HDFS though. > This Jira aims to create a new copyListing impl that is used for diff based > copyListing that doesn't traverse the directory but only adds paths that are > present in its diff. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747295#comment-17747295 ] ASF GitHub Bot commented on HDFS-17120: --- sadanand48 commented on code in PR #5885: URL: https://github.com/apache/hadoop/pull/5885#discussion_r1274428899 ## hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java: ## @@ -316,6 +303,42 @@ protected void doBuildListingWithSnapshotDiff( } } + /** + * Handle create Diffs and add to the copyList. + * If the path is a directory, iterate it recursively and add the paths + * to the result copyList. + * + * @param fileListWriter the list for holding processed results + * @param context The DistCp context with associated input options + * @param sourceRoot The rootDir of the source snapshot + * @param sourceFS the source Filesystem + * @param fileStatuses store the result fileStatuses to add to the copyList + * @param diff the SnapshotDiff report + * @throws IOException + */ + protected void addCreateDiffsToFileListing(SequenceFile.Writer fileListWriter, + DistCpContext context, Path sourceRoot, FileSystem sourceFS, + List fileStatuses, DiffInfo diff) throws IOException { +addToFileListing(fileListWriter, sourceRoot, diff.getTarget(), context); + +FileStatus sourceStatus = sourceFS.getFileStatus(diff.getTarget()); Review Comment: Yes, I didn't do so to change existing behaviour of SimpleCopyListing. > Support snapshot diff based copylisting for flat paths. > --- > > Key: HDFS-17120 > URL: https://issues.apache.org/jira/browse/HDFS-17120 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sadanand Shenoy >Assignee: Sadanand Shenoy >Priority: Major > Labels: pull-request-available > > Currently for Diff-based copyListing that is used during the distcpSync step > of an incremental copy by default the SimpleCopyListing implementation is > used. In it's implementation it iterates through the DiffReport and if the > DiffType is Create and the path is a directory, it recursively traverses the > directory and adds the subpaths to the resultant copyList. > This works fine for implementations of snapshotDiff that include only > top-level directories as part of its DiffReport . Suppose a snapshotDiff > implementation outputs only flat paths that include both the directory and > sub-directory subpath in its DiffReport, it will lead to duplicate paths in > the copyList and throws DuplicateFileException. > > For example > Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all > subpaths as part of the diff. > {code:java} > [~]# ozone sh snapshot create vol11/buck1 snap1 > [~]# ozone sh snapshot create vol11/buck2 snap1 > [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111 > [~]# ozone sh snapshot create vol11/buck1 snap2 > [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2 > Difference between snapshot: snap1 and snapshot: snap2 > + ./dir1 > + ./dir1/dir11 > + ./dir1/dir11/dir111 {code} > we can see even though dir11 & dir111 are subpaths they are present in > snapdiff , This is not the case for HDFS though. > This Jira aims to create a new copyListing impl that is used for diff based > copyListing that doesn't traverse the directory but only adds paths that are > present in its diff. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747293#comment-17747293 ] ASF GitHub Bot commented on HDFS-17120: --- sadanand48 commented on code in PR #5885: URL: https://github.com/apache/hadoop/pull/5885#discussion_r1274428340 ## hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java: ## @@ -305,6 +305,33 @@ public static CopyListing getCopyListing(Configuration configuration, } } + /** + * Public Factory method with which the appropriate Diff CopyListing implementation may be retrieved. + * @param configuration The input configuration. + * @param credentials Credentials object on which the FS delegation tokens are cached + * @param distCpSync DistcpSync object used to sync diffs between source and target. + * @return An instance of the appropriate CopyListing implementation. Review Comment: Done. > Support snapshot diff based copylisting for flat paths. > --- > > Key: HDFS-17120 > URL: https://issues.apache.org/jira/browse/HDFS-17120 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sadanand Shenoy >Assignee: Sadanand Shenoy >Priority: Major > Labels: pull-request-available > > Currently for Diff-based copyListing that is used during the distcpSync step > of an incremental copy by default the SimpleCopyListing implementation is > used. In it's implementation it iterates through the DiffReport and if the > DiffType is Create and the path is a directory, it recursively traverses the > directory and adds the subpaths to the resultant copyList. > This works fine for implementations of snapshotDiff that include only > top-level directories as part of its DiffReport . Suppose a snapshotDiff > implementation outputs only flat paths that include both the directory and > sub-directory subpath in its DiffReport, it will lead to duplicate paths in > the copyList and throws DuplicateFileException. > > For example > Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all > subpaths as part of the diff. > {code:java} > [~]# ozone sh snapshot create vol11/buck1 snap1 > [~]# ozone sh snapshot create vol11/buck2 snap1 > [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111 > [~]# ozone sh snapshot create vol11/buck1 snap2 > [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2 > Difference between snapshot: snap1 and snapshot: snap2 > + ./dir1 > + ./dir1/dir11 > + ./dir1/dir11/dir111 {code} > we can see even though dir11 & dir111 are subpaths they are present in > snapdiff , This is not the case for HDFS though. > This Jira aims to create a new copyListing impl that is used for diff based > copyListing that doesn't traverse the directory but only adds paths that are > present in its diff. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747236#comment-17747236 ] ASF GitHub Bot commented on HDFS-17120: --- umamaheswararao commented on code in PR #5885: URL: https://github.com/apache/hadoop/pull/5885#discussion_r1274301605 ## hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java: ## @@ -316,6 +303,42 @@ protected void doBuildListingWithSnapshotDiff( } } + /** + * Handle create Diffs and add to the copyList. + * If the path is a directory, iterate it recursively and add the paths + * to the result copyList. + * + * @param fileListWriter the list for holding processed results + * @param context The DistCp context with associated input options + * @param sourceRoot The rootDir of the source snapshot + * @param sourceFS the source Filesystem + * @param fileStatuses store the result fileStatuses to add to the copyList + * @param diff the SnapshotDiff report + * @throws IOException + */ + protected void addCreateDiffsToFileListing(SequenceFile.Writer fileListWriter, + DistCpContext context, Path sourceRoot, FileSystem sourceFS, + List fileStatuses, DiffInfo diff) throws IOException { +addToFileListing(fileListWriter, sourceRoot, diff.getTarget(), context); + +FileStatus sourceStatus = sourceFS.getFileStatus(diff.getTarget()); Review Comment: Have you thought about just having a advanced flag to control this? I am not sure we will be having many implementations of these copyListings I am not against the current design, but it's a simple thought to keep it simple ## hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java: ## @@ -167,6 +169,77 @@ public void testDuplicates() { } } + @Test(expected = DuplicateFileException.class, timeout = 1) + public void testDiffBasedSimpleCopyListing() throws IOException { +FileSystem fs = null; +Configuration configuration = getConf(); +DistCpSync distCpSync = Mockito.mock(DistCpSync.class); +Path listingFile = new Path("/tmp/list"); +// Throws DuplicateFileException when copyListing is SimpleCopyListing +// as it recursively traverses src3 directory and also adds 3.txt,4.txt twice +configuration.set(DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_CLASS, +SimpleCopyListing.class.getName()); Review Comment: How this config should be configured when we want to copy from HDFS to Ozone or vise versa ? > Support snapshot diff based copylisting for flat paths. > --- > > Key: HDFS-17120 > URL: https://issues.apache.org/jira/browse/HDFS-17120 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sadanand Shenoy >Assignee: Sadanand Shenoy >Priority: Major > Labels: pull-request-available > > Currently for Diff-based copyListing that is used during the distcpSync step > of an incremental copy by default the SimpleCopyListing implementation is > used. In it's implementation it iterates through the DiffReport and if the > DiffType is Create and the path is a directory, it recursively traverses the > directory and adds the subpaths to the resultant copyList. > This works fine for implementations of snapshotDiff that include only > top-level directories as part of its DiffReport . Suppose a snapshotDiff > implementation outputs only flat paths that include both the directory and > sub-directory subpath in its DiffReport, it will lead to duplicate paths in > the copyList and throws DuplicateFileException. > > For example > Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all > subpaths as part of the diff. > {code:java} > [~]# ozone sh snapshot create vol11/buck1 snap1 > [~]# ozone sh snapshot create vol11/buck2 snap1 > [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111 > [~]# ozone sh snapshot create vol11/buck1 snap2 > [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2 > Difference between snapshot: snap1 and snapshot: snap2 > + ./dir1 > + ./dir1/dir11 > + ./dir1/dir11/dir111 {code} > we can see even though dir11 & dir111 are subpaths they are present in > snapdiff , This is not the case for HDFS though. > This Jira aims to create a new copyListing impl that is used for diff based > copyListing that doesn't traverse the directory but only adds paths that are > present in its diff. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747234#comment-17747234 ] ASF GitHub Bot commented on HDFS-17120: --- umamaheswararao commented on code in PR #5885: URL: https://github.com/apache/hadoop/pull/5885#discussion_r1274292957 ## hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java: ## @@ -305,6 +305,33 @@ public static CopyListing getCopyListing(Configuration configuration, } } + /** + * Public Factory method with which the appropriate Diff CopyListing implementation may be retrieved. + * @param configuration The input configuration. + * @param credentials Credentials object on which the FS delegation tokens are cached + * @param distCpSync DistcpSync object used to sync diffs between source and target. + * @return An instance of the appropriate CopyListing implementation. Review Comment: Please remove fully qualified package here? I don't think we have other customer IOEception class. > Support snapshot diff based copylisting for flat paths. > --- > > Key: HDFS-17120 > URL: https://issues.apache.org/jira/browse/HDFS-17120 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sadanand Shenoy >Assignee: Sadanand Shenoy >Priority: Major > Labels: pull-request-available > > Currently for Diff-based copyListing that is used during the distcpSync step > of an incremental copy by default the SimpleCopyListing implementation is > used. In it's implementation it iterates through the DiffReport and if the > DiffType is Create and the path is a directory, it recursively traverses the > directory and adds the subpaths to the resultant copyList. > This works fine for implementations of snapshotDiff that include only > top-level directories as part of its DiffReport . Suppose a snapshotDiff > implementation outputs only flat paths that include both the directory and > sub-directory subpath in its DiffReport, it will lead to duplicate paths in > the copyList and throws DuplicateFileException. > > For example > Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all > subpaths as part of the diff. > {code:java} > [~]# ozone sh snapshot create vol11/buck1 snap1 > [~]# ozone sh snapshot create vol11/buck2 snap1 > [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111 > [~]# ozone sh snapshot create vol11/buck1 snap2 > [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2 > Difference between snapshot: snap1 and snapshot: snap2 > + ./dir1 > + ./dir1/dir11 > + ./dir1/dir11/dir111 {code} > we can see even though dir11 & dir111 are subpaths they are present in > snapdiff , This is not the case for HDFS though. > This Jira aims to create a new copyListing impl that is used for diff based > copyListing that doesn't traverse the directory but only adds paths that are > present in its diff. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747233#comment-17747233 ] ASF GitHub Bot commented on HDFS-17120: --- umamaheswararao commented on code in PR #5885: URL: https://github.com/apache/hadoop/pull/5885#discussion_r1274291510 ## hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java: ## @@ -305,6 +305,33 @@ public static CopyListing getCopyListing(Configuration configuration, } } + /** + * Public Factory method with which the appropriate Diff CopyListing implementation may be retrieved. + * @param configuration The input configuration. + * @param credentials Credentials object on which the FS delegation tokens are cached + * @param distCpSync DistcpSync object used to sync diffs between source and target. + * @return An instance of the appropriate CopyListing implementation. + * @throws java.io.IOException - Exception if any + */ + public static CopyListing getDiffCopyListing(Configuration configuration, + Credentials credentials, DistCpSync distCpSync) throws IOException { +String copyListingClassName = Review Comment: Why are we assigning default from config here? you are always reassiging with copyListingClass.getName(); right? > Support snapshot diff based copylisting for flat paths. > --- > > Key: HDFS-17120 > URL: https://issues.apache.org/jira/browse/HDFS-17120 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sadanand Shenoy >Assignee: Sadanand Shenoy >Priority: Major > Labels: pull-request-available > > Currently for Diff-based copyListing that is used during the distcpSync step > of an incremental copy by default the SimpleCopyListing implementation is > used. In it's implementation it iterates through the DiffReport and if the > DiffType is Create and the path is a directory, it recursively traverses the > directory and adds the subpaths to the resultant copyList. > This works fine for implementations of snapshotDiff that include only > top-level directories as part of its DiffReport . Suppose a snapshotDiff > implementation outputs only flat paths that include both the directory and > sub-directory subpath in its DiffReport, it will lead to duplicate paths in > the copyList and throws DuplicateFileException. > > For example > Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all > subpaths as part of the diff. > {code:java} > [~]# ozone sh snapshot create vol11/buck1 snap1 > [~]# ozone sh snapshot create vol11/buck2 snap1 > [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111 > [~]# ozone sh snapshot create vol11/buck1 snap2 > [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2 > Difference between snapshot: snap1 and snapshot: snap2 > + ./dir1 > + ./dir1/dir11 > + ./dir1/dir11/dir111 {code} > we can see even though dir11 & dir111 are subpaths they are present in > snapdiff , This is not the case for HDFS though. > This Jira aims to create a new copyListing impl that is used for diff based > copyListing that doesn't traverse the directory but only adds paths that are > present in its diff. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747209#comment-17747209 ] ASF GitHub Bot commented on HDFS-17120: --- hadoop-yetus commented on PR #5885: URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1650684670 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 38s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 45m 54s | | trunk passed | | +1 :green_heart: | compile | 0m 35s | | trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | compile | 0m 34s | | trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | checkstyle | 0m 33s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 38s | | trunk passed | | +1 :green_heart: | javadoc | 0m 37s | | trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javadoc | 0m 34s | | trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | spotbugs | 0m 58s | | trunk passed | | +1 :green_heart: | shadedclient | 33m 32s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 27s | | the patch passed | | +1 :green_heart: | compile | 0m 24s | | the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javac | 0m 24s | | the patch passed | | +1 :green_heart: | compile | 0m 23s | | the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | javac | 0m 23s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 18s | [/results-checkstyle-hadoop-tools_hadoop-distcp.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/5/artifact/out/results-checkstyle-hadoop-tools_hadoop-distcp.txt) | hadoop-tools/hadoop-distcp: The patch generated 1 new + 44 unchanged - 0 fixed = 45 total (was 44) | | +1 :green_heart: | mvnsite | 0m 26s | | the patch passed | | +1 :green_heart: | javadoc | 0m 23s | | hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 generated 0 new + 39 unchanged - 2 fixed = 39 total (was 41) | | +1 :green_heart: | javadoc | 0m 22s | | hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 generated 0 new + 39 unchanged - 2 fixed = 39 total (was 41) | | +1 :green_heart: | spotbugs | 0m 50s | | the patch passed | | +1 :green_heart: | shadedclient | 33m 33s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 15m 8s | | hadoop-distcp in the patch passed. | | +1 :green_heart: | asflicense | 0m 42s | | The patch does not generate ASF License warnings. | | | | 141m 30s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/5/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5885 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 5cd8625b1112 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 70ef6eb851e72186768c8d284c93e201ff05637f | | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | Test Results |
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747164#comment-17747164 ] ASF GitHub Bot commented on HDFS-17120: --- sadanand48 commented on code in PR #5885: URL: https://github.com/apache/hadoop/pull/5885#discussion_r1274102593 ## hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java: ## @@ -305,6 +305,33 @@ public static CopyListing getCopyListing(Configuration configuration, } } + /** + * Public Factory method with which the appropriate Diff CopyListing implementation may be retrieved. + * @param configuration The input configuration. + * @param credentials Credentials object on which the FS delegation tokens are cached + * @param distCpSync DistcpSync object used to sync diffs between source and target. + * @return An instance of the appropriate CopyListing implementation. + * @throws java.io.IOException - Exception if any + */ + public static CopyListing getDiffCopyListing(Configuration configuration, + Credentials credentials, DistCpSync distCpSync) throws IOException { +String copyListingClassName = +configuration.get(DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_CLASS, +""); +try { + Class copyListingClass = + configuration.getClass(DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_CLASS, + SimpleCopyListing.class, SimpleCopyListing.class); + copyListingClassName = copyListingClass.getName(); + Constructor constructor = + copyListingClass.getDeclaredConstructor(Configuration.class, + Credentials.class, DistCpSync.class); + return constructor.newInstance(configuration, credentials,distCpSync); Review Comment: Done. ## hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java: ## @@ -167,6 +169,66 @@ public void testDuplicates() { } } + @Test(timeout=1) + public void testFlatDiffCopyListing() { +FileSystem fs = null; +try { + fs = FileSystem.get(getConf()); + List srcPaths = new ArrayList(); + srcPaths.add(new Path("/tmp/in")); + TestDistCpUtils.createFile(fs, "/tmp/in/src1/1.txt"); + TestDistCpUtils.createFile(fs, "/tmp/in/src2/1.txt"); + TestDistCpUtils.createFile(fs, "/tmp/in/src3/3.txt"); + TestDistCpUtils.createFile(fs, "/tmp/in/src3/4.txt"); + Path target = new Path("/tmp/out"); + Path listingFile = new Path("/tmp/list"); + // adding below flags useDiff & sync only to enable context.shouldUseSnapshotDiff() + final DistCpOptions options = new DistCpOptions.Builder(srcPaths, target) + .withUseDiff("snap1","snap2") + .withSyncFolder(true) + .build(); + final DistCpContext context = new DistCpContext(options); + Configuration configuration = getConf(); + configuration.set(DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_CLASS, + FlatDiffCopyListing.class.getName()); + DistCpSync distCpSync = Mockito.mock(DistCpSync.class); + // Create a dummy DiffInfo List that contains a directory + paths inside + // that directory as part of the diff. + + ArrayList diffs = new ArrayList<>(); + diffs.add( + new DiffInfo(new Path("/tmp/in/src3/"), new Path("/tmp/in/src3/"), + SnapshotDiffReport.DiffType.CREATE)); + diffs.add(new DiffInfo(new Path("/tmp/in/src3/3.txt"), + new Path("/tmp/in/src3/3.txt"), SnapshotDiffReport.DiffType.CREATE)); + diffs.add(new DiffInfo(new Path("/tmp/in/src3/4.txt"), + new Path("/tmp/in/src3/4.txt"), SnapshotDiffReport.DiffType.CREATE)); + Mockito.when(distCpSync.prepareDiffListForCopyListing()).thenReturn(diffs); + + CopyListing listing = + CopyListing.getDiffCopyListing(configuration, CREDENTIALS,distCpSync); + // won't throw DuplicateFileException as copyListing is FlatDiffCopyListing. + listing.buildListing(listingFile, context); + + // Throws DuplicateFileException when copyListing is SimpleCopyListing + // as it recursively traverses src3 directory and also adds 3.txt,4.txt twice + configuration.set(DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_CLASS,SimpleCopyListing.class.getName()); + try{ +listing = +CopyListing.getDiffCopyListing(configuration, CREDENTIALS,distCpSync); Review Comment: Done. ## hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java: ## @@ -167,6 +169,66 @@ public void testDuplicates() { } } + @Test(timeout=1) + public void testFlatDiffCopyListing() { +FileSystem fs = null; +try { + fs = FileSystem.get(getConf()); + List srcPaths = new ArrayList(); + srcPaths.add(new Path("/tmp/in")); + TestDistCpUtils.createFile(fs, "/tmp/in/src1/1.txt"); +
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747052#comment-17747052 ] ASF GitHub Bot commented on HDFS-17120: --- swamirishi commented on code in PR #5885: URL: https://github.com/apache/hadoop/pull/5885#discussion_r1273728074 ## hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java: ## @@ -305,6 +305,33 @@ public static CopyListing getCopyListing(Configuration configuration, } } + /** + * Public Factory method with which the appropriate Diff CopyListing implementation may be retrieved. + * @param configuration The input configuration. + * @param credentials Credentials object on which the FS delegation tokens are cached + * @param distCpSync DistcpSync object used to sync diffs between source and target. + * @return An instance of the appropriate CopyListing implementation. + * @throws java.io.IOException - Exception if any + */ + public static CopyListing getDiffCopyListing(Configuration configuration, + Credentials credentials, DistCpSync distCpSync) throws IOException { +String copyListingClassName = +configuration.get(DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_CLASS, +""); +try { + Class copyListingClass = + configuration.getClass(DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_CLASS, + SimpleCopyListing.class, SimpleCopyListing.class); + copyListingClassName = copyListingClass.getName(); + Constructor constructor = + copyListingClass.getDeclaredConstructor(Configuration.class, + Credentials.class, DistCpSync.class); + return constructor.newInstance(configuration, credentials,distCpSync); Review Comment: ```suggestion return constructor.newInstance(configuration, credentials, distCpSync); ``` ## hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java: ## @@ -167,6 +169,66 @@ public void testDuplicates() { } } + @Test(timeout=1) + public void testFlatDiffCopyListing() { +FileSystem fs = null; +try { + fs = FileSystem.get(getConf()); + List srcPaths = new ArrayList(); + srcPaths.add(new Path("/tmp/in")); + TestDistCpUtils.createFile(fs, "/tmp/in/src1/1.txt"); + TestDistCpUtils.createFile(fs, "/tmp/in/src2/1.txt"); + TestDistCpUtils.createFile(fs, "/tmp/in/src3/3.txt"); + TestDistCpUtils.createFile(fs, "/tmp/in/src3/4.txt"); + Path target = new Path("/tmp/out"); + Path listingFile = new Path("/tmp/list"); + // adding below flags useDiff & sync only to enable context.shouldUseSnapshotDiff() + final DistCpOptions options = new DistCpOptions.Builder(srcPaths, target) + .withUseDiff("snap1","snap2") + .withSyncFolder(true) + .build(); + final DistCpContext context = new DistCpContext(options); + Configuration configuration = getConf(); + configuration.set(DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_CLASS, + FlatDiffCopyListing.class.getName()); + DistCpSync distCpSync = Mockito.mock(DistCpSync.class); + // Create a dummy DiffInfo List that contains a directory + paths inside + // that directory as part of the diff. + + ArrayList diffs = new ArrayList<>(); + diffs.add( + new DiffInfo(new Path("/tmp/in/src3/"), new Path("/tmp/in/src3/"), + SnapshotDiffReport.DiffType.CREATE)); + diffs.add(new DiffInfo(new Path("/tmp/in/src3/3.txt"), + new Path("/tmp/in/src3/3.txt"), SnapshotDiffReport.DiffType.CREATE)); + diffs.add(new DiffInfo(new Path("/tmp/in/src3/4.txt"), + new Path("/tmp/in/src3/4.txt"), SnapshotDiffReport.DiffType.CREATE)); + Mockito.when(distCpSync.prepareDiffListForCopyListing()).thenReturn(diffs); + + CopyListing listing = + CopyListing.getDiffCopyListing(configuration, CREDENTIALS,distCpSync); + // won't throw DuplicateFileException as copyListing is FlatDiffCopyListing. + listing.buildListing(listingFile, context); + + // Throws DuplicateFileException when copyListing is SimpleCopyListing + // as it recursively traverses src3 directory and also adds 3.txt,4.txt twice + configuration.set(DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_CLASS,SimpleCopyListing.class.getName()); + try{ +listing = +CopyListing.getDiffCopyListing(configuration, CREDENTIALS,distCpSync); Review Comment: ```suggestion CopyListing.getDiffCopyListing(configuration, CREDENTIALS, distCpSync); ``` ## hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java: ## @@ -167,6 +169,66 @@ public void testDuplicates() { } } + @Test(timeout=1) + public void testFlatDiffCopyListing() { +FileSystem fs = null; +try {
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17746999#comment-17746999 ] ASF GitHub Bot commented on HDFS-17120: --- hadoop-yetus commented on PR #5885: URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1649840700 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 30s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 33m 22s | | trunk passed | | +1 :green_heart: | compile | 0m 26s | | trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | compile | 0m 24s | | trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | checkstyle | 0m 24s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 28s | | trunk passed | | +1 :green_heart: | javadoc | 0m 27s | | trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javadoc | 0m 24s | | trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | spotbugs | 0m 41s | | trunk passed | | +1 :green_heart: | shadedclient | 21m 0s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 19s | | the patch passed | | +1 :green_heart: | compile | 0m 18s | | the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javac | 0m 18s | | the patch passed | | +1 :green_heart: | compile | 0m 16s | | the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | javac | 0m 16s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 13s | [/results-checkstyle-hadoop-tools_hadoop-distcp.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/4/artifact/out/results-checkstyle-hadoop-tools_hadoop-distcp.txt) | hadoop-tools/hadoop-distcp: The patch generated 7 new + 44 unchanged - 0 fixed = 51 total (was 44) | | +1 :green_heart: | mvnsite | 0m 18s | | the patch passed | | +1 :green_heart: | javadoc | 0m 17s | | hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 generated 0 new + 39 unchanged - 2 fixed = 39 total (was 41) | | +1 :green_heart: | javadoc | 0m 16s | | hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 generated 0 new + 39 unchanged - 2 fixed = 39 total (was 41) | | +1 :green_heart: | spotbugs | 0m 36s | | the patch passed | | +1 :green_heart: | shadedclient | 20m 42s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 13m 20s | | hadoop-distcp in the patch passed. | | +1 :green_heart: | asflicense | 0m 29s | | The patch does not generate ASF License warnings. | | | | 98m 4s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5885 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux ee6b33cd79ff 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 47e79e6ddd212e193c69fb0e09e43fee3a9d8dd1 | | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | Test Results |
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17746929#comment-17746929 ] ASF GitHub Bot commented on HDFS-17120: --- hadoop-yetus commented on PR #5885: URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1649663453 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 37s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 48m 50s | | trunk passed | | +1 :green_heart: | compile | 0m 34s | | trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | compile | 0m 32s | | trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | checkstyle | 0m 33s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 38s | | trunk passed | | +1 :green_heart: | javadoc | 0m 37s | | trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javadoc | 0m 34s | | trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | spotbugs | 0m 55s | | trunk passed | | +1 :green_heart: | shadedclient | 33m 59s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 25s | | the patch passed | | +1 :green_heart: | compile | 0m 23s | | the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javac | 0m 23s | | the patch passed | | +1 :green_heart: | compile | 0m 21s | | the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | javac | 0m 21s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 19s | [/results-checkstyle-hadoop-tools_hadoop-distcp.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/3/artifact/out/results-checkstyle-hadoop-tools_hadoop-distcp.txt) | hadoop-tools/hadoop-distcp: The patch generated 7 new + 44 unchanged - 0 fixed = 51 total (was 44) | | +1 :green_heart: | mvnsite | 0m 25s | | the patch passed | | -1 :x: | javadoc | 0m 22s | [/results-javadoc-javadoc-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/3/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt) | hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 generated 1 new + 39 unchanged - 2 fixed = 40 total (was 41) | | +1 :green_heart: | javadoc | 0m 22s | | hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 generated 0 new + 40 unchanged - 1 fixed = 40 total (was 41) | | +1 :green_heart: | spotbugs | 0m 50s | | the patch passed | | +1 :green_heart: | shadedclient | 33m 44s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 15m 2s | | hadoop-distcp in the patch passed. | | +1 :green_heart: | asflicense | 0m 42s | | The patch does not generate ASF License warnings. | | | | 144m 51s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5885 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux f1e2dd4127b0 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / b33ca7e27eade12523d3f6169b43ffc6f914accf | | Default Java | Private
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17746826#comment-17746826 ] ASF GitHub Bot commented on HDFS-17120: --- hadoop-yetus commented on PR #5885: URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1649328587 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 27s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 33m 41s | | trunk passed | | +1 :green_heart: | compile | 0m 25s | | trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | compile | 0m 23s | | trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | checkstyle | 0m 24s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 27s | | trunk passed | | +1 :green_heart: | javadoc | 0m 28s | | trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javadoc | 0m 24s | | trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | spotbugs | 0m 41s | | trunk passed | | +1 :green_heart: | shadedclient | 20m 59s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 19s | | the patch passed | | +1 :green_heart: | compile | 0m 18s | | the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javac | 0m 18s | | the patch passed | | +1 :green_heart: | compile | 0m 16s | | the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | javac | 0m 16s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 13s | [/results-checkstyle-hadoop-tools_hadoop-distcp.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/2/artifact/out/results-checkstyle-hadoop-tools_hadoop-distcp.txt) | hadoop-tools/hadoop-distcp: The patch generated 7 new + 42 unchanged - 0 fixed = 49 total (was 42) | | +1 :green_heart: | mvnsite | 0m 18s | | the patch passed | | -1 :x: | javadoc | 0m 17s | [/results-javadoc-javadoc-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/2/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt) | hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 generated 8 new + 40 unchanged - 1 fixed = 48 total (was 41) | | -1 :x: | javadoc | 0m 16s | [/results-javadoc-javadoc-hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/2/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.txt) | hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 generated 8 new + 40 unchanged - 1 fixed = 48 total (was 41) | | +1 :green_heart: | spotbugs | 0m 34s | | the patch passed | | +1 :green_heart: | shadedclient | 20m 54s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 13m 15s | | hadoop-distcp in the patch passed. | | +1 :green_heart: | asflicense | 0m 29s | | The patch does not generate ASF License warnings. | | | | 98m 29s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5885 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17746795#comment-17746795 ] ASF GitHub Bot commented on HDFS-17120: --- sadanand48 commented on PR #5885: URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1649205250 @ayushtkn @steveloughran @umamaheswararao Can I please get a review? > Support snapshot diff based copylisting for flat paths. > --- > > Key: HDFS-17120 > URL: https://issues.apache.org/jira/browse/HDFS-17120 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sadanand Shenoy >Assignee: Sadanand Shenoy >Priority: Major > Labels: pull-request-available > > Currently for Diff-based copyListing that is used during the distcpSync step > of an incremental copy by default the SimpleCopyListing implementation is > used. In it's implementation it iterates through the DiffReport and if the > DiffType is Create and the path is a directory, it recursively traverses the > directory and adds the subpaths to the resultant copyList. > This works fine for implementations of snapshotDiff that include only > top-level directories as part of its DiffReport . Suppose a snapshotDiff > implementation outputs only flat paths that include both the directory and > sub-directory subpath in its DiffReport, it will lead to duplicate paths in > the copyList and throws DuplicateFileException. > > For example > Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all > subpaths as part of the diff. > {code:java} > [~]# ozone sh snapshot create vol11/buck1 snap1 > [~]# ozone sh snapshot create vol11/buck2 snap1 > [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111 > [~]# ozone sh snapshot create vol11/buck1 snap2 > [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2 > Difference between snapshot: snap1 and snapshot: snap2 > + ./dir1 > + ./dir1/dir11 > + ./dir1/dir11/dir111 {code} > we can see even though dir11 & dir111 are subpaths they are present in > snapdiff , This is not the case for HDFS though. > This Jira aims to create a new copyListing impl that is used for diff based > copyListing that doesn't traverse the directory but only adds paths that are > present in its diff. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17746713#comment-17746713 ] ASF GitHub Bot commented on HDFS-17120: --- prashantpogde commented on PR #5885: URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1648796679 @umamaheswararao Can you please take a look ? > Support snapshot diff based copylisting for flat paths. > --- > > Key: HDFS-17120 > URL: https://issues.apache.org/jira/browse/HDFS-17120 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sadanand Shenoy >Assignee: Sadanand Shenoy >Priority: Major > Labels: pull-request-available > > Currently for Diff-based copyListing that is used during the distcpSync step > of an incremental copy by default the SimpleCopyListing implementation is > used. In it's implementation it iterates through the DiffReport and if the > DiffType is Create and the path is a directory, it recursively traverses the > directory and adds the subpaths to the resultant copyList. > This works fine for implementations of snapshotDiff that include only > top-level directories as part of its DiffReport . Suppose a snapshotDiff > implementation outputs only flat paths that include both the directory and > sub-directory subpath in its DiffReport, it will lead to duplicate paths in > the copyList and throws DuplicateFileException. > > For example > Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all > subpaths as part of the diff. > {code:java} > [~]# ozone sh snapshot create vol11/buck1 snap1 > [~]# ozone sh snapshot create vol11/buck2 snap1 > [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111 > [~]# ozone sh snapshot create vol11/buck1 snap2 > [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2 > Difference between snapshot: snap1 and snapshot: snap2 > + ./dir1 > + ./dir1/dir11 > + ./dir1/dir11/dir111 {code} > we can see even though dir11 & dir111 are subpaths they are present in > snapdiff , This is not the case for HDFS though. > This Jira aims to create a new copyListing impl that is used for diff based > copyListing that doesn't traverse the directory but only adds paths that are > present in its diff. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17746645#comment-17746645 ] ASF GitHub Bot commented on HDFS-17120: --- hadoop-yetus commented on PR #5885: URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1648595933 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 30s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 35m 15s | | trunk passed | | +1 :green_heart: | compile | 0m 25s | | trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | compile | 0m 23s | | trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | checkstyle | 0m 24s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 27s | | trunk passed | | +1 :green_heart: | javadoc | 0m 28s | | trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javadoc | 0m 25s | | trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | spotbugs | 0m 42s | | trunk passed | | +1 :green_heart: | shadedclient | 20m 53s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 18s | | the patch passed | | +1 :green_heart: | compile | 0m 17s | | the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javac | 0m 17s | | the patch passed | | +1 :green_heart: | compile | 0m 16s | | the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 | | +1 :green_heart: | javac | 0m 16s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 13s | [/results-checkstyle-hadoop-tools_hadoop-distcp.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/1/artifact/out/results-checkstyle-hadoop-tools_hadoop-distcp.txt) | hadoop-tools/hadoop-distcp: The patch generated 7 new + 42 unchanged - 0 fixed = 49 total (was 42) | | +1 :green_heart: | mvnsite | 0m 18s | | the patch passed | | -1 :x: | javadoc | 0m 17s | [/patch-javadoc-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/1/artifact/out/patch-javadoc-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt) | hadoop-distcp in the patch failed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1. | | -1 :x: | javadoc | 0m 16s | [/patch-javadoc-hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/1/artifact/out/patch-javadoc-hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.txt) | hadoop-distcp in the patch failed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09. | | +1 :green_heart: | spotbugs | 0m 36s | | the patch passed | | +1 :green_heart: | shadedclient | 24m 48s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 14m 14s | | hadoop-distcp in the patch passed. | | +1 :green_heart: | asflicense | 0m 29s | | The patch does not generate ASF License warnings. | | | | 105m 52s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5885 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 8407f2688740 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk /
[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.
[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17746608#comment-17746608 ] ASF GitHub Bot commented on HDFS-17120: --- sadanand48 opened a new pull request, #5885: URL: https://github.com/apache/hadoop/pull/5885 ### Description of PR Currently for Diff-based copyListing that is used during the distcpSync step of an incremental copy by default the SimpleCopyListing implementation is used. In it's implementation it iterates through the DiffReport and if the DiffType is Create and the path is a directory, it recursively traverses the directory and adds the subpaths to the resultant copyList. This PR adds a copyListing implementation which only considers flat paths in snapshotDiff report & doesn't traverse directories recursively. There is no impact to existing behaviour as the default copyListing impl for diff based copy is SimpleCopyListing but can be overridden if desired using a config. https://issues.apache.org/jira/browse/HDFS-17120 ### How was this patch tested? ### For code changes: Added Unit tests > Support snapshot diff based copylisting for flat paths. > --- > > Key: HDFS-17120 > URL: https://issues.apache.org/jira/browse/HDFS-17120 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sadanand Shenoy >Assignee: Sadanand Shenoy >Priority: Major > > Currently for Diff-based copyListing that is used during the distcpSync step > of an incremental copy by default the SimpleCopyListing implementation is > used. In it's implementation it iterates through the DiffReport and if the > DiffType is Create and the path is a directory, it recursively traverses the > directory and adds the subpaths to the resultant copyList. > This works fine for implementations of snapshotDiff that include only > top-level directories as part of its DiffReport . Suppose a snapshotDiff > implementation outputs only flat paths that include both the directory and > sub-directory subpath in its DiffReport, it will lead to duplicate paths in > the copyList and throws DuplicateFileException. > > For example > Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all > subpaths as part of the diff. > {code:java} > [~]# ozone sh snapshot create vol11/buck1 snap1 > [~]# ozone sh snapshot create vol11/buck2 snap1 > [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111 > [~]# ozone sh snapshot create vol11/buck1 snap2 > [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2 > Difference between snapshot: snap1 and snapshot: snap2 > + ./dir1 > + ./dir1/dir11 > + ./dir1/dir11/dir111 {code} > we can see even though dir11 & dir111 are subpaths they are present in > snapdiff , This is not the case for HDFS though. > This Jira aims to create a new copyListing impl that is used for diff based > copyListing that doesn't traverse the directory but only adds paths that are > present in its diff. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org