[ https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747825#comment-17747825 ]
ASF GitHub Bot commented on HDFS-17120: --------------------------------------- umamaheswararao commented on PR #5885: URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1653089202 Thanks @sadanand48 for the contribution. Thanks @swamirishi and @ayushtkn for the reviews. > Support snapshot diff based copylisting for flat paths. > ------------------------------------------------------- > > Key: HDFS-17120 > URL: https://issues.apache.org/jira/browse/HDFS-17120 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Sadanand Shenoy > Assignee: Sadanand Shenoy > Priority: Major > Labels: pull-request-available > > Currently for Diff-based copyListing that is used during the distcpSync step > of an incremental copy by default the SimpleCopyListing implementation is > used. In it's implementation it iterates through the DiffReport and if the > DiffType is Create and the path is a directory, it recursively traverses the > directory and adds the subpaths to the resultant copyList. > This works fine for implementations of snapshotDiff that include only > top-level directories as part of its DiffReport . Suppose a snapshotDiff > implementation outputs only flat paths that include both the directory and > sub-directory subpath in its DiffReport, it will lead to duplicate paths in > the copyList and throws DuplicateFileException. > > For example > Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all > subpaths as part of the diff. > {code:java} > [~]# ozone sh snapshot create vol11/buck1 snap1 > [~]# ozone sh snapshot create vol11/buck2 snap1 > [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11 > [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111 > [~]# ozone sh snapshot create vol11/buck1 snap2 > [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2 > Difference between snapshot: snap1 and snapshot: snap2 > + ./dir1 > + ./dir1/dir11 > + ./dir1/dir11/dir111 {code} > we can see even though dir11 & dir111 are subpaths they are present in > snapdiff , This is not the case for HDFS though. > This Jira aims to create a new copyListing impl that is used for diff based > copyListing that doesn't traverse the directory but only adds paths that are > present in its diff. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org