[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748198#comment-17748198
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

umamaheswararao commented on PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1653794909

   I agree, hasPathCapability is a good idea. scheme based dependencies have 
problems with viewfs like setups. otherwise anyway It may need to depend on 
resolvePaths to make it work correctly for viewfs I believe. 




> Support snapshot diff based copylisting for flat paths.
> ---
>
> Key: HDFS-17120
> URL: https://issues.apache.org/jira/browse/HDFS-17120
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sadanand Shenoy
>Assignee: Sadanand Shenoy
>Priority: Major
>  Labels: pull-request-available
>
> Currently for Diff-based copyListing that is used during the distcpSync step 
> of an incremental copy by default the SimpleCopyListing implementation is 
> used.  In it's implementation it iterates through the DiffReport and if the 
> DiffType is Create and the path is a directory, it recursively traverses the 
> directory and adds the subpaths to the resultant copyList.
> This works fine for implementations of snapshotDiff that include only 
> top-level directories as part of its DiffReport . Suppose a snapshotDiff 
> implementation outputs only flat paths that include both the directory and 
> sub-directory subpath in its DiffReport, it will lead to duplicate paths in 
> the copyList and throws DuplicateFileException.
>  
> For example
> Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all 
> subpaths as part of the diff.
> {code:java}
> [~]# ozone sh snapshot create vol11/buck1 snap1
> [~]# ozone sh snapshot create vol11/buck2 snap1
> [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111
> [~]# ozone sh snapshot create vol11/buck1 snap2 
> [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2
> Difference between snapshot: snap1 and snapshot: snap2
> +    ./dir1
> +    ./dir1/dir11
> +    ./dir1/dir11/dir111  {code}
> we can see even though dir11 & dir111 are subpaths they are present in 
> snapdiff , This is not the case for HDFS though.
> This Jira aims to create a new copyListing impl that is used for diff based 
> copyListing that doesn't traverse the directory but only adds paths that are 
> present in its diff.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748172#comment-17748172
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

steveloughran commented on PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1653673632

   the way to be adaptive for FS is to use hasPathCapability() on the FS and so 
ask it what it does, rather than hard code various filesystem schemas.




> Support snapshot diff based copylisting for flat paths.
> ---
>
> Key: HDFS-17120
> URL: https://issues.apache.org/jira/browse/HDFS-17120
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sadanand Shenoy
>Assignee: Sadanand Shenoy
>Priority: Major
>  Labels: pull-request-available
>
> Currently for Diff-based copyListing that is used during the distcpSync step 
> of an incremental copy by default the SimpleCopyListing implementation is 
> used.  In it's implementation it iterates through the DiffReport and if the 
> DiffType is Create and the path is a directory, it recursively traverses the 
> directory and adds the subpaths to the resultant copyList.
> This works fine for implementations of snapshotDiff that include only 
> top-level directories as part of its DiffReport . Suppose a snapshotDiff 
> implementation outputs only flat paths that include both the directory and 
> sub-directory subpath in its DiffReport, it will lead to duplicate paths in 
> the copyList and throws DuplicateFileException.
>  
> For example
> Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all 
> subpaths as part of the diff.
> {code:java}
> [~]# ozone sh snapshot create vol11/buck1 snap1
> [~]# ozone sh snapshot create vol11/buck2 snap1
> [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111
> [~]# ozone sh snapshot create vol11/buck1 snap2 
> [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2
> Difference between snapshot: snap1 and snapshot: snap2
> +    ./dir1
> +    ./dir1/dir11
> +    ./dir1/dir11/dir111  {code}
> we can see even though dir11 & dir111 are subpaths they are present in 
> snapdiff , This is not the case for HDFS though.
> This Jira aims to create a new copyListing impl that is used for diff based 
> copyListing that doesn't traverse the directory but only adds paths that are 
> present in its diff.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747828#comment-17747828
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

sadanand48 commented on PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1653093330

   Filed HDFS-17131 for the same. Thanks @umamaheswararao , @ayushtkn , 
@swamirishi .




> Support snapshot diff based copylisting for flat paths.
> ---
>
> Key: HDFS-17120
> URL: https://issues.apache.org/jira/browse/HDFS-17120
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sadanand Shenoy
>Assignee: Sadanand Shenoy
>Priority: Major
>  Labels: pull-request-available
>
> Currently for Diff-based copyListing that is used during the distcpSync step 
> of an incremental copy by default the SimpleCopyListing implementation is 
> used.  In it's implementation it iterates through the DiffReport and if the 
> DiffType is Create and the path is a directory, it recursively traverses the 
> directory and adds the subpaths to the resultant copyList.
> This works fine for implementations of snapshotDiff that include only 
> top-level directories as part of its DiffReport . Suppose a snapshotDiff 
> implementation outputs only flat paths that include both the directory and 
> sub-directory subpath in its DiffReport, it will lead to duplicate paths in 
> the copyList and throws DuplicateFileException.
>  
> For example
> Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all 
> subpaths as part of the diff.
> {code:java}
> [~]# ozone sh snapshot create vol11/buck1 snap1
> [~]# ozone sh snapshot create vol11/buck2 snap1
> [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111
> [~]# ozone sh snapshot create vol11/buck1 snap2 
> [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2
> Difference between snapshot: snap1 and snapshot: snap2
> +    ./dir1
> +    ./dir1/dir11
> +    ./dir1/dir11/dir111  {code}
> we can see even though dir11 & dir111 are subpaths they are present in 
> snapdiff , This is not the case for HDFS though.
> This Jira aims to create a new copyListing impl that is used for diff based 
> copyListing that doesn't traverse the directory but only adds paths that are 
> present in its diff.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747825#comment-17747825
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

umamaheswararao commented on PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1653089202

   Thanks @sadanand48 for the contribution. Thanks @swamirishi and @ayushtkn 
for the reviews.




> Support snapshot diff based copylisting for flat paths.
> ---
>
> Key: HDFS-17120
> URL: https://issues.apache.org/jira/browse/HDFS-17120
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sadanand Shenoy
>Assignee: Sadanand Shenoy
>Priority: Major
>  Labels: pull-request-available
>
> Currently for Diff-based copyListing that is used during the distcpSync step 
> of an incremental copy by default the SimpleCopyListing implementation is 
> used.  In it's implementation it iterates through the DiffReport and if the 
> DiffType is Create and the path is a directory, it recursively traverses the 
> directory and adds the subpaths to the resultant copyList.
> This works fine for implementations of snapshotDiff that include only 
> top-level directories as part of its DiffReport . Suppose a snapshotDiff 
> implementation outputs only flat paths that include both the directory and 
> sub-directory subpath in its DiffReport, it will lead to duplicate paths in 
> the copyList and throws DuplicateFileException.
>  
> For example
> Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all 
> subpaths as part of the diff.
> {code:java}
> [~]# ozone sh snapshot create vol11/buck1 snap1
> [~]# ozone sh snapshot create vol11/buck2 snap1
> [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111
> [~]# ozone sh snapshot create vol11/buck1 snap2 
> [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2
> Difference between snapshot: snap1 and snapshot: snap2
> +    ./dir1
> +    ./dir1/dir11
> +    ./dir1/dir11/dir111  {code}
> we can see even though dir11 & dir111 are subpaths they are present in 
> snapdiff , This is not the case for HDFS though.
> This Jira aims to create a new copyListing impl that is used for diff based 
> copyListing that doesn't traverse the directory but only adds paths that are 
> present in its diff.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747823#comment-17747823
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

umamaheswararao merged PR #5885:
URL: https://github.com/apache/hadoop/pull/5885




> Support snapshot diff based copylisting for flat paths.
> ---
>
> Key: HDFS-17120
> URL: https://issues.apache.org/jira/browse/HDFS-17120
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sadanand Shenoy
>Assignee: Sadanand Shenoy
>Priority: Major
>  Labels: pull-request-available
>
> Currently for Diff-based copyListing that is used during the distcpSync step 
> of an incremental copy by default the SimpleCopyListing implementation is 
> used.  In it's implementation it iterates through the DiffReport and if the 
> DiffType is Create and the path is a directory, it recursively traverses the 
> directory and adds the subpaths to the resultant copyList.
> This works fine for implementations of snapshotDiff that include only 
> top-level directories as part of its DiffReport . Suppose a snapshotDiff 
> implementation outputs only flat paths that include both the directory and 
> sub-directory subpath in its DiffReport, it will lead to duplicate paths in 
> the copyList and throws DuplicateFileException.
>  
> For example
> Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all 
> subpaths as part of the diff.
> {code:java}
> [~]# ozone sh snapshot create vol11/buck1 snap1
> [~]# ozone sh snapshot create vol11/buck2 snap1
> [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111
> [~]# ozone sh snapshot create vol11/buck1 snap2 
> [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2
> Difference between snapshot: snap1 and snapshot: snap2
> +    ./dir1
> +    ./dir1/dir11
> +    ./dir1/dir11/dir111  {code}
> we can see even though dir11 & dir111 are subpaths they are present in 
> snapdiff , This is not the case for HDFS though.
> This Jira aims to create a new copyListing impl that is used for diff based 
> copyListing that doesn't traverse the directory but only adds paths that are 
> present in its diff.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747822#comment-17747822
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

umamaheswararao commented on PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1653086998

   @sadanand48 could you please file the followup JIRAs as you thinking?




> Support snapshot diff based copylisting for flat paths.
> ---
>
> Key: HDFS-17120
> URL: https://issues.apache.org/jira/browse/HDFS-17120
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sadanand Shenoy
>Assignee: Sadanand Shenoy
>Priority: Major
>  Labels: pull-request-available
>
> Currently for Diff-based copyListing that is used during the distcpSync step 
> of an incremental copy by default the SimpleCopyListing implementation is 
> used.  In it's implementation it iterates through the DiffReport and if the 
> DiffType is Create and the path is a directory, it recursively traverses the 
> directory and adds the subpaths to the resultant copyList.
> This works fine for implementations of snapshotDiff that include only 
> top-level directories as part of its DiffReport . Suppose a snapshotDiff 
> implementation outputs only flat paths that include both the directory and 
> sub-directory subpath in its DiffReport, it will lead to duplicate paths in 
> the copyList and throws DuplicateFileException.
>  
> For example
> Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all 
> subpaths as part of the diff.
> {code:java}
> [~]# ozone sh snapshot create vol11/buck1 snap1
> [~]# ozone sh snapshot create vol11/buck2 snap1
> [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111
> [~]# ozone sh snapshot create vol11/buck1 snap2 
> [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2
> Difference between snapshot: snap1 and snapshot: snap2
> +    ./dir1
> +    ./dir1/dir11
> +    ./dir1/dir11/dir111  {code}
> we can see even though dir11 & dir111 are subpaths they are present in 
> snapdiff , This is not the case for HDFS though.
> This Jira aims to create a new copyListing impl that is used for diff based 
> copyListing that doesn't traverse the directory but only adds paths that are 
> present in its diff.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-27 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747808#comment-17747808
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

sadanand48 commented on PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1653039629

   Thanks @ayushtkn for taking a look.
   
   1. If the config (`traverseDirectories`) is set to false in case of HDFS, it 
will cause `DuplicateFileException` and fail the distcp operation. Hence the 
default is set to true. 
   2. I agree , this is a good point and can solve the below problems you 
stated wrt to the same config object being used by both HDFS and Ozone in an 
application like Hive replication etc. I will raise a follow up patch/jira to 
solve this case and the config to be deduced from the fs scheme. 
   3. For `getTraverseExcludeList` : redundant copy case comes only when we 
recursively traverse and add paths not present in the diff. Here we are adding 
all paths from diff itself, so it wouldn't have any redundant path. 
   
   




> Support snapshot diff based copylisting for flat paths.
> ---
>
> Key: HDFS-17120
> URL: https://issues.apache.org/jira/browse/HDFS-17120
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sadanand Shenoy
>Assignee: Sadanand Shenoy
>Priority: Major
>  Labels: pull-request-available
>
> Currently for Diff-based copyListing that is used during the distcpSync step 
> of an incremental copy by default the SimpleCopyListing implementation is 
> used.  In it's implementation it iterates through the DiffReport and if the 
> DiffType is Create and the path is a directory, it recursively traverses the 
> directory and adds the subpaths to the resultant copyList.
> This works fine for implementations of snapshotDiff that include only 
> top-level directories as part of its DiffReport . Suppose a snapshotDiff 
> implementation outputs only flat paths that include both the directory and 
> sub-directory subpath in its DiffReport, it will lead to duplicate paths in 
> the copyList and throws DuplicateFileException.
>  
> For example
> Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all 
> subpaths as part of the diff.
> {code:java}
> [~]# ozone sh snapshot create vol11/buck1 snap1
> [~]# ozone sh snapshot create vol11/buck2 snap1
> [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111
> [~]# ozone sh snapshot create vol11/buck1 snap2 
> [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2
> Difference between snapshot: snap1 and snapshot: snap2
> +    ./dir1
> +    ./dir1/dir11
> +    ./dir1/dir11/dir111  {code}
> we can see even though dir11 & dir111 are subpaths they are present in 
> snapdiff , This is not the case for HDFS though.
> This Jira aims to create a new copyListing impl that is used for diff based 
> copyListing that doesn't traverse the directory but only adds paths that are 
> present in its diff.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747602#comment-17747602
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

umamaheswararao commented on PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1652194090

   Latest changes looks good to me. Thanks @sadanand48 for working on this.




> Support snapshot diff based copylisting for flat paths.
> ---
>
> Key: HDFS-17120
> URL: https://issues.apache.org/jira/browse/HDFS-17120
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sadanand Shenoy
>Assignee: Sadanand Shenoy
>Priority: Major
>  Labels: pull-request-available
>
> Currently for Diff-based copyListing that is used during the distcpSync step 
> of an incremental copy by default the SimpleCopyListing implementation is 
> used.  In it's implementation it iterates through the DiffReport and if the 
> DiffType is Create and the path is a directory, it recursively traverses the 
> directory and adds the subpaths to the resultant copyList.
> This works fine for implementations of snapshotDiff that include only 
> top-level directories as part of its DiffReport . Suppose a snapshotDiff 
> implementation outputs only flat paths that include both the directory and 
> sub-directory subpath in its DiffReport, it will lead to duplicate paths in 
> the copyList and throws DuplicateFileException.
>  
> For example
> Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all 
> subpaths as part of the diff.
> {code:java}
> [~]# ozone sh snapshot create vol11/buck1 snap1
> [~]# ozone sh snapshot create vol11/buck2 snap1
> [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111
> [~]# ozone sh snapshot create vol11/buck1 snap2 
> [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2
> Difference between snapshot: snap1 and snapshot: snap2
> +    ./dir1
> +    ./dir1/dir11
> +    ./dir1/dir11/dir111  {code}
> we can see even though dir11 & dir111 are subpaths they are present in 
> snapdiff , This is not the case for HDFS though.
> This Jira aims to create a new copyListing impl that is used for diff based 
> copyListing that doesn't traverse the directory but only adds paths that are 
> present in its diff.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747591#comment-17747591
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

hadoop-yetus commented on PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1652149377

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 38s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  34m 11s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 25s |  |  trunk passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |   0m 24s |  |  trunk passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   0m 24s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 27s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 28s |  |  trunk passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 24s |  |  trunk passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   0m 42s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m  2s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 19s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |   0m 18s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 16s |  |  the patch passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |   0m 16s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 13s | 
[/results-checkstyle-hadoop-tools_hadoop-distcp.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/9/artifact/out/results-checkstyle-hadoop-tools_hadoop-distcp.txt)
 |  hadoop-tools/hadoop-distcp: The patch generated 1 new + 20 unchanged - 0 
fixed = 21 total (was 20)  |
   | +1 :green_heart: |  mvnsite  |   0m 18s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 17s |  |  
hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 with 
JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 generated 0 new + 37 unchanged 
- 4 fixed = 37 total (was 41)  |
   | +1 :green_heart: |  javadoc  |   0m 16s |  |  
hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09
 with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 generated 0 
new + 37 unchanged - 4 fixed = 37 total (was 41)  |
   | +1 :green_heart: |  spotbugs  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 48s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  13m 18s |  |  hadoop-distcp in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 29s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  99m 36s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/9/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5885 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 395508a62168 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 
13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / bc3fd7fcf1233aef46acd48f7708a31c126787b8 |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   |  Test Results | 

[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747560#comment-17747560
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

sadanand48 commented on code in PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#discussion_r1275096942


##
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java:
##
@@ -167,6 +169,77 @@ public void testDuplicates() {
 }
   }
 
+  @Test(expected = DuplicateFileException.class, timeout = 1)
+  public void testDiffBasedSimpleCopyListing() throws IOException {
+FileSystem fs = null;
+Configuration configuration = getConf();
+DistCpSync distCpSync = Mockito.mock(DistCpSync.class);
+Path listingFile = new Path("/tmp/list");
+// Throws DuplicateFileException as it recursively traverses src3 directory
+// and also adds 3.txt,4.txt twice
+configuration.setBoolean(
+DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_TRAVERSE_DIRECTORY, true);
+try {
+  fs = FileSystem.get(getConf());
+  buildListingUsingSnapshotDiff(fs, configuration, distCpSync, 
listingFile);
+} catch (IOException e) {
+  LOG.error("Exception encountered in test", e);

Review Comment:
   done.



##
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java:
##
@@ -167,6 +169,77 @@ public void testDuplicates() {
 }
   }
 
+  @Test(expected = DuplicateFileException.class, timeout = 1)
+  public void testDiffBasedSimpleCopyListing() throws IOException {
+FileSystem fs = null;
+Configuration configuration = getConf();
+DistCpSync distCpSync = Mockito.mock(DistCpSync.class);
+Path listingFile = new Path("/tmp/list");
+// Throws DuplicateFileException as it recursively traverses src3 directory
+// and also adds 3.txt,4.txt twice
+configuration.setBoolean(
+DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_TRAVERSE_DIRECTORY, true);
+try {
+  fs = FileSystem.get(getConf());
+  buildListingUsingSnapshotDiff(fs, configuration, distCpSync, 
listingFile);
+} catch (IOException e) {
+  LOG.error("Exception encountered in test", e);
+  Assert.fail("Test failed " + e.getMessage());
+} finally {
+  TestDistCpUtils.delete(fs, "/tmp");
+}
+  }
+
+  @Test(timeout=1)
+  public void testDiffBasedSimpleCopyListingWithoutTraverseDirectory() {
+FileSystem fs = null;
+Configuration configuration = getConf();
+DistCpSync distCpSync = Mockito.mock(DistCpSync.class);
+Path listingFile = new Path("/tmp/list");
+// no exception expected in this case
+configuration.setBoolean(
+DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_TRAVERSE_DIRECTORY, 
false);
+try {
+  fs = FileSystem.get(getConf());
+  buildListingUsingSnapshotDiff(fs, configuration, distCpSync, 
listingFile);
+} catch (IOException e) {
+  LOG.error("Exception encountered in test", e);

Review Comment:
   done.





> Support snapshot diff based copylisting for flat paths.
> ---
>
> Key: HDFS-17120
> URL: https://issues.apache.org/jira/browse/HDFS-17120
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sadanand Shenoy
>Assignee: Sadanand Shenoy
>Priority: Major
>  Labels: pull-request-available
>
> Currently for Diff-based copyListing that is used during the distcpSync step 
> of an incremental copy by default the SimpleCopyListing implementation is 
> used.  In it's implementation it iterates through the DiffReport and if the 
> DiffType is Create and the path is a directory, it recursively traverses the 
> directory and adds the subpaths to the resultant copyList.
> This works fine for implementations of snapshotDiff that include only 
> top-level directories as part of its DiffReport . Suppose a snapshotDiff 
> implementation outputs only flat paths that include both the directory and 
> sub-directory subpath in its DiffReport, it will lead to duplicate paths in 
> the copyList and throws DuplicateFileException.
>  
> For example
> Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all 
> subpaths as part of the diff.
> {code:java}
> [~]# ozone sh snapshot create vol11/buck1 snap1
> [~]# ozone sh snapshot create vol11/buck2 snap1
> [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111
> [~]# ozone sh snapshot create vol11/buck1 snap2 
> [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2
> Difference between snapshot: snap1 and snapshot: snap2
> +    ./dir1
> +    ./dir1/dir11
> +    ./dir1/dir11/dir111  {code}
> we can see even though dir11 & dir111 are subpaths they are present in 
> snapdiff , This is 

[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747532#comment-17747532
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

steveloughran commented on code in PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#discussion_r1274999246


##
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java:
##
@@ -167,6 +169,77 @@ public void testDuplicates() {
 }
   }
 
+  @Test(expected = DuplicateFileException.class, timeout = 1)
+  public void testDiffBasedSimpleCopyListing() throws IOException {
+FileSystem fs = null;
+Configuration configuration = getConf();
+DistCpSync distCpSync = Mockito.mock(DistCpSync.class);
+Path listingFile = new Path("/tmp/list");
+// Throws DuplicateFileException as it recursively traverses src3 directory
+// and also adds 3.txt,4.txt twice
+configuration.setBoolean(
+DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_TRAVERSE_DIRECTORY, true);
+try {
+  fs = FileSystem.get(getConf());
+  buildListingUsingSnapshotDiff(fs, configuration, distCpSync, 
listingFile);
+} catch (IOException e) {
+  LOG.error("Exception encountered in test", e);
+  Assert.fail("Test failed " + e.getMessage());
+} finally {
+  TestDistCpUtils.delete(fs, "/tmp");
+}
+  }
+
+  @Test(timeout=1)
+  public void testDiffBasedSimpleCopyListingWithoutTraverseDirectory() {
+FileSystem fs = null;
+Configuration configuration = getConf();
+DistCpSync distCpSync = Mockito.mock(DistCpSync.class);
+Path listingFile = new Path("/tmp/list");
+// no exception expected in this case
+configuration.setBoolean(
+DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_TRAVERSE_DIRECTORY, 
false);
+try {
+  fs = FileSystem.get(getConf());
+  buildListingUsingSnapshotDiff(fs, configuration, distCpSync, 
listingFile);
+} catch (IOException e) {
+  LOG.error("Exception encountered in test", e);

Review Comment:
   same.



##
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java:
##
@@ -167,6 +169,77 @@ public void testDuplicates() {
 }
   }
 
+  @Test(expected = DuplicateFileException.class, timeout = 1)
+  public void testDiffBasedSimpleCopyListing() throws IOException {
+FileSystem fs = null;
+Configuration configuration = getConf();
+DistCpSync distCpSync = Mockito.mock(DistCpSync.class);
+Path listingFile = new Path("/tmp/list");
+// Throws DuplicateFileException as it recursively traverses src3 directory
+// and also adds 3.txt,4.txt twice
+configuration.setBoolean(
+DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_TRAVERSE_DIRECTORY, true);
+try {
+  fs = FileSystem.get(getConf());
+  buildListingUsingSnapshotDiff(fs, configuration, distCpSync, 
listingFile);
+} catch (IOException e) {
+  LOG.error("Exception encountered in test", e);

Review Comment:
   no need to catch; just throw





> Support snapshot diff based copylisting for flat paths.
> ---
>
> Key: HDFS-17120
> URL: https://issues.apache.org/jira/browse/HDFS-17120
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sadanand Shenoy
>Assignee: Sadanand Shenoy
>Priority: Major
>  Labels: pull-request-available
>
> Currently for Diff-based copyListing that is used during the distcpSync step 
> of an incremental copy by default the SimpleCopyListing implementation is 
> used.  In it's implementation it iterates through the DiffReport and if the 
> DiffType is Create and the path is a directory, it recursively traverses the 
> directory and adds the subpaths to the resultant copyList.
> This works fine for implementations of snapshotDiff that include only 
> top-level directories as part of its DiffReport . Suppose a snapshotDiff 
> implementation outputs only flat paths that include both the directory and 
> sub-directory subpath in its DiffReport, it will lead to duplicate paths in 
> the copyList and throws DuplicateFileException.
>  
> For example
> Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all 
> subpaths as part of the diff.
> {code:java}
> [~]# ozone sh snapshot create vol11/buck1 snap1
> [~]# ozone sh snapshot create vol11/buck2 snap1
> [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111
> [~]# ozone sh snapshot create vol11/buck1 snap2 
> [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2
> Difference between snapshot: snap1 and snapshot: snap2
> +    ./dir1
> +    ./dir1/dir11
> +    ./dir1/dir11/dir111  {code}
> we can see even though dir11 & dir111 are subpaths they are 

[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747480#comment-17747480
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

hadoop-yetus commented on PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1651686956

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 29s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m 35s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 26s |  |  trunk passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |   0m 24s |  |  trunk passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   0m 24s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 28s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 28s |  |  trunk passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 24s |  |  trunk passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   0m 40s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 39s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 19s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |   0m 18s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 16s |  |  the patch passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |   0m 16s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 13s | 
[/results-checkstyle-hadoop-tools_hadoop-distcp.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/8/artifact/out/results-checkstyle-hadoop-tools_hadoop-distcp.txt)
 |  hadoop-tools/hadoop-distcp: The patch generated 1 new + 20 unchanged - 0 
fixed = 21 total (was 20)  |
   | +1 :green_heart: |  mvnsite  |   0m 18s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 16s |  |  
hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 with 
JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 generated 0 new + 37 unchanged 
- 4 fixed = 37 total (was 41)  |
   | +1 :green_heart: |  javadoc  |   0m 16s |  |  
hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09
 with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 generated 0 
new + 37 unchanged - 4 fixed = 37 total (was 41)  |
   | +1 :green_heart: |  spotbugs  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 55s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  13m 20s |  |  hadoop-distcp in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 29s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  96m 22s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/8/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5885 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux b944c168be7f 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 
13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 55d1196a354d68045d8d2452eb722671928a3d65 |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   |  Test Results | 

[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747379#comment-17747379
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

hadoop-yetus commented on PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1651219900

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 43s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  45m 47s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  trunk passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |   0m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   0m 34s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 37s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 38s |  |  trunk passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 33s |  |  trunk passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   0m 57s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  33m 55s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | -1 :x: |  mvninstall  |   0m 20s | 
[/patch-mvninstall-hadoop-tools_hadoop-distcp.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/7/artifact/out/patch-mvninstall-hadoop-tools_hadoop-distcp.txt)
 |  hadoop-distcp in the patch failed.  |
   | -1 :x: |  compile  |   0m 20s | 
[/patch-compile-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/7/artifact/out/patch-compile-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt)
 |  hadoop-distcp in the patch failed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.  |
   | -1 :x: |  javac  |   0m 20s | 
[/patch-compile-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/7/artifact/out/patch-compile-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt)
 |  hadoop-distcp in the patch failed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.  |
   | -1 :x: |  compile  |   0m 19s | 
[/patch-compile-hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/7/artifact/out/patch-compile-hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.txt)
 |  hadoop-distcp in the patch failed with JDK Private 
Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.  |
   | -1 :x: |  javac  |   0m 19s | 
[/patch-compile-hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/7/artifact/out/patch-compile-hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.txt)
 |  hadoop-distcp in the patch failed with JDK Private 
Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 19s | 
[/results-checkstyle-hadoop-tools_hadoop-distcp.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/7/artifact/out/results-checkstyle-hadoop-tools_hadoop-distcp.txt)
 |  hadoop-tools/hadoop-distcp: The patch generated 1 new + 44 unchanged - 0 
fixed = 45 total (was 44)  |
   | -1 :x: |  mvnsite  |   0m 21s | 
[/patch-mvnsite-hadoop-tools_hadoop-distcp.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/7/artifact/out/patch-mvnsite-hadoop-tools_hadoop-distcp.txt)
 |  hadoop-distcp in the patch failed.  |
   | -1 :x: |  javadoc  |   0m 20s | 
[/patch-javadoc-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/7/artifact/out/patch-javadoc-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt)
 |  hadoop-distcp in the 

[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747344#comment-17747344
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

hadoop-yetus commented on PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1651156351

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 27s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  34m 59s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 25s |  |  trunk passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |   0m 24s |  |  trunk passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   0m 24s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 27s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 28s |  |  trunk passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 24s |  |  trunk passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   0m 41s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 37s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | -1 :x: |  mvninstall  |   0m 15s | 
[/patch-mvninstall-hadoop-tools_hadoop-distcp.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/6/artifact/out/patch-mvninstall-hadoop-tools_hadoop-distcp.txt)
 |  hadoop-distcp in the patch failed.  |
   | -1 :x: |  compile  |   0m 15s | 
[/patch-compile-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/6/artifact/out/patch-compile-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt)
 |  hadoop-distcp in the patch failed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.  |
   | -1 :x: |  javac  |   0m 15s | 
[/patch-compile-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/6/artifact/out/patch-compile-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt)
 |  hadoop-distcp in the patch failed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.  |
   | -1 :x: |  compile  |   0m 15s | 
[/patch-compile-hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/6/artifact/out/patch-compile-hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.txt)
 |  hadoop-distcp in the patch failed with JDK Private 
Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.  |
   | -1 :x: |  javac  |   0m 15s | 
[/patch-compile-hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/6/artifact/out/patch-compile-hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.txt)
 |  hadoop-distcp in the patch failed with JDK Private 
Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 13s | 
[/results-checkstyle-hadoop-tools_hadoop-distcp.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/6/artifact/out/results-checkstyle-hadoop-tools_hadoop-distcp.txt)
 |  hadoop-tools/hadoop-distcp: The patch generated 1 new + 44 unchanged - 0 
fixed = 45 total (was 44)  |
   | -1 :x: |  mvnsite  |   0m 15s | 
[/patch-mvnsite-hadoop-tools_hadoop-distcp.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/6/artifact/out/patch-mvnsite-hadoop-tools_hadoop-distcp.txt)
 |  hadoop-distcp in the patch failed.  |
   | -1 :x: |  javadoc  |   0m 16s | 
[/patch-javadoc-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/6/artifact/out/patch-javadoc-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt)
 |  hadoop-distcp in the 

[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747301#comment-17747301
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

sadanand48 commented on code in PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#discussion_r1274432347


##
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java:
##
@@ -305,6 +305,33 @@ public static CopyListing getCopyListing(Configuration 
configuration,
 }
   }
 
+  /**
+   * Public Factory method with which the appropriate Diff CopyListing 
implementation may be retrieved.
+   * @param configuration The input configuration.
+   * @param credentials Credentials object on which the FS delegation tokens 
are cached
+   * @param distCpSync DistcpSync object used to sync diffs between source and 
target.
+   * @return An instance of the appropriate CopyListing implementation.
+   * @throws java.io.IOException - Exception if any
+   */
+  public static CopyListing getDiffCopyListing(Configuration configuration,
+  Credentials credentials, DistCpSync distCpSync) throws IOException {
+String copyListingClassName =

Review Comment:
   Removed this now





> Support snapshot diff based copylisting for flat paths.
> ---
>
> Key: HDFS-17120
> URL: https://issues.apache.org/jira/browse/HDFS-17120
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sadanand Shenoy
>Assignee: Sadanand Shenoy
>Priority: Major
>  Labels: pull-request-available
>
> Currently for Diff-based copyListing that is used during the distcpSync step 
> of an incremental copy by default the SimpleCopyListing implementation is 
> used.  In it's implementation it iterates through the DiffReport and if the 
> DiffType is Create and the path is a directory, it recursively traverses the 
> directory and adds the subpaths to the resultant copyList.
> This works fine for implementations of snapshotDiff that include only 
> top-level directories as part of its DiffReport . Suppose a snapshotDiff 
> implementation outputs only flat paths that include both the directory and 
> sub-directory subpath in its DiffReport, it will lead to duplicate paths in 
> the copyList and throws DuplicateFileException.
>  
> For example
> Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all 
> subpaths as part of the diff.
> {code:java}
> [~]# ozone sh snapshot create vol11/buck1 snap1
> [~]# ozone sh snapshot create vol11/buck2 snap1
> [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111
> [~]# ozone sh snapshot create vol11/buck1 snap2 
> [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2
> Difference between snapshot: snap1 and snapshot: snap2
> +    ./dir1
> +    ./dir1/dir11
> +    ./dir1/dir11/dir111  {code}
> we can see even though dir11 & dir111 are subpaths they are present in 
> snapdiff , This is not the case for HDFS though.
> This Jira aims to create a new copyListing impl that is used for diff based 
> copyListing that doesn't traverse the directory but only adds paths that are 
> present in its diff.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747300#comment-17747300
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

sadanand48 commented on PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1651055042

   Thanks @umamaheswararao @swamirishi for the review, I have now updated the 
patch to use a flag instead of a new copyListing type. Please take a look.




> Support snapshot diff based copylisting for flat paths.
> ---
>
> Key: HDFS-17120
> URL: https://issues.apache.org/jira/browse/HDFS-17120
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sadanand Shenoy
>Assignee: Sadanand Shenoy
>Priority: Major
>  Labels: pull-request-available
>
> Currently for Diff-based copyListing that is used during the distcpSync step 
> of an incremental copy by default the SimpleCopyListing implementation is 
> used.  In it's implementation it iterates through the DiffReport and if the 
> DiffType is Create and the path is a directory, it recursively traverses the 
> directory and adds the subpaths to the resultant copyList.
> This works fine for implementations of snapshotDiff that include only 
> top-level directories as part of its DiffReport . Suppose a snapshotDiff 
> implementation outputs only flat paths that include both the directory and 
> sub-directory subpath in its DiffReport, it will lead to duplicate paths in 
> the copyList and throws DuplicateFileException.
>  
> For example
> Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all 
> subpaths as part of the diff.
> {code:java}
> [~]# ozone sh snapshot create vol11/buck1 snap1
> [~]# ozone sh snapshot create vol11/buck2 snap1
> [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111
> [~]# ozone sh snapshot create vol11/buck1 snap2 
> [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2
> Difference between snapshot: snap1 and snapshot: snap2
> +    ./dir1
> +    ./dir1/dir11
> +    ./dir1/dir11/dir111  {code}
> we can see even though dir11 & dir111 are subpaths they are present in 
> snapdiff , This is not the case for HDFS though.
> This Jira aims to create a new copyListing impl that is used for diff based 
> copyListing that doesn't traverse the directory but only adds paths that are 
> present in its diff.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747298#comment-17747298
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

sadanand48 commented on code in PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#discussion_r1274430970


##
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java:
##
@@ -167,6 +169,77 @@ public void testDuplicates() {
 }
   }
 
+  @Test(expected = DuplicateFileException.class, timeout = 1)
+  public void testDiffBasedSimpleCopyListing() throws IOException {
+FileSystem fs = null;
+Configuration configuration = getConf();
+DistCpSync distCpSync = Mockito.mock(DistCpSync.class);
+Path listingFile = new Path("/tmp/list");
+// Throws DuplicateFileException when copyListing is SimpleCopyListing
+// as it recursively traverses src3 directory and also adds 3.txt,4.txt 
twice
+configuration.set(DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_CLASS,
+SimpleCopyListing.class.getName());

Review Comment:
   HDFS to Ozone -> traverseDirectory - true as hdfs diff contains only top 
level dirs
   Ozone to HDFS -> traverseDirectory - false as ozone diff contains all 
subpaths too





> Support snapshot diff based copylisting for flat paths.
> ---
>
> Key: HDFS-17120
> URL: https://issues.apache.org/jira/browse/HDFS-17120
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sadanand Shenoy
>Assignee: Sadanand Shenoy
>Priority: Major
>  Labels: pull-request-available
>
> Currently for Diff-based copyListing that is used during the distcpSync step 
> of an incremental copy by default the SimpleCopyListing implementation is 
> used.  In it's implementation it iterates through the DiffReport and if the 
> DiffType is Create and the path is a directory, it recursively traverses the 
> directory and adds the subpaths to the resultant copyList.
> This works fine for implementations of snapshotDiff that include only 
> top-level directories as part of its DiffReport . Suppose a snapshotDiff 
> implementation outputs only flat paths that include both the directory and 
> sub-directory subpath in its DiffReport, it will lead to duplicate paths in 
> the copyList and throws DuplicateFileException.
>  
> For example
> Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all 
> subpaths as part of the diff.
> {code:java}
> [~]# ozone sh snapshot create vol11/buck1 snap1
> [~]# ozone sh snapshot create vol11/buck2 snap1
> [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111
> [~]# ozone sh snapshot create vol11/buck1 snap2 
> [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2
> Difference between snapshot: snap1 and snapshot: snap2
> +    ./dir1
> +    ./dir1/dir11
> +    ./dir1/dir11/dir111  {code}
> we can see even though dir11 & dir111 are subpaths they are present in 
> snapdiff , This is not the case for HDFS though.
> This Jira aims to create a new copyListing impl that is used for diff based 
> copyListing that doesn't traverse the directory but only adds paths that are 
> present in its diff.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747297#comment-17747297
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

sadanand48 commented on code in PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#discussion_r1274429488


##
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java:
##
@@ -316,6 +303,42 @@ protected void doBuildListingWithSnapshotDiff(
 }
   }
 
+  /**
+   * Handle create Diffs and add to the copyList.
+   * If the path is a directory, iterate it recursively and add the paths
+   * to the result copyList.
+   *
+   * @param fileListWriter the list for holding processed results
+   * @param context The DistCp context with associated input options
+   * @param sourceRoot The rootDir of the source snapshot
+   * @param sourceFS the source Filesystem
+   * @param fileStatuses store the result fileStatuses to add to the copyList
+   * @param diff the SnapshotDiff report
+   * @throws IOException
+   */
+  protected void addCreateDiffsToFileListing(SequenceFile.Writer 
fileListWriter,
+  DistCpContext context, Path sourceRoot, FileSystem sourceFS,
+  List fileStatuses, DiffInfo diff) throws IOException {
+addToFileListing(fileListWriter, sourceRoot, diff.getTarget(), context);
+
+FileStatus sourceStatus = sourceFS.getFileStatus(diff.getTarget());

Review Comment:
   I have now changed it to use a flag now.





> Support snapshot diff based copylisting for flat paths.
> ---
>
> Key: HDFS-17120
> URL: https://issues.apache.org/jira/browse/HDFS-17120
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sadanand Shenoy
>Assignee: Sadanand Shenoy
>Priority: Major
>  Labels: pull-request-available
>
> Currently for Diff-based copyListing that is used during the distcpSync step 
> of an incremental copy by default the SimpleCopyListing implementation is 
> used.  In it's implementation it iterates through the DiffReport and if the 
> DiffType is Create and the path is a directory, it recursively traverses the 
> directory and adds the subpaths to the resultant copyList.
> This works fine for implementations of snapshotDiff that include only 
> top-level directories as part of its DiffReport . Suppose a snapshotDiff 
> implementation outputs only flat paths that include both the directory and 
> sub-directory subpath in its DiffReport, it will lead to duplicate paths in 
> the copyList and throws DuplicateFileException.
>  
> For example
> Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all 
> subpaths as part of the diff.
> {code:java}
> [~]# ozone sh snapshot create vol11/buck1 snap1
> [~]# ozone sh snapshot create vol11/buck2 snap1
> [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111
> [~]# ozone sh snapshot create vol11/buck1 snap2 
> [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2
> Difference between snapshot: snap1 and snapshot: snap2
> +    ./dir1
> +    ./dir1/dir11
> +    ./dir1/dir11/dir111  {code}
> we can see even though dir11 & dir111 are subpaths they are present in 
> snapdiff , This is not the case for HDFS though.
> This Jira aims to create a new copyListing impl that is used for diff based 
> copyListing that doesn't traverse the directory but only adds paths that are 
> present in its diff.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747296#comment-17747296
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

sadanand48 commented on code in PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#discussion_r1274428899


##
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java:
##
@@ -316,6 +303,42 @@ protected void doBuildListingWithSnapshotDiff(
 }
   }
 
+  /**
+   * Handle create Diffs and add to the copyList.
+   * If the path is a directory, iterate it recursively and add the paths
+   * to the result copyList.
+   *
+   * @param fileListWriter the list for holding processed results
+   * @param context The DistCp context with associated input options
+   * @param sourceRoot The rootDir of the source snapshot
+   * @param sourceFS the source Filesystem
+   * @param fileStatuses store the result fileStatuses to add to the copyList
+   * @param diff the SnapshotDiff report
+   * @throws IOException
+   */
+  protected void addCreateDiffsToFileListing(SequenceFile.Writer 
fileListWriter,
+  DistCpContext context, Path sourceRoot, FileSystem sourceFS,
+  List fileStatuses, DiffInfo diff) throws IOException {
+addToFileListing(fileListWriter, sourceRoot, diff.getTarget(), context);
+
+FileStatus sourceStatus = sourceFS.getFileStatus(diff.getTarget());

Review Comment:
   Yes, I didn't do so to change existing behaviour of SimpleCopyListing. I am 
good with using a flag here too.





> Support snapshot diff based copylisting for flat paths.
> ---
>
> Key: HDFS-17120
> URL: https://issues.apache.org/jira/browse/HDFS-17120
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sadanand Shenoy
>Assignee: Sadanand Shenoy
>Priority: Major
>  Labels: pull-request-available
>
> Currently for Diff-based copyListing that is used during the distcpSync step 
> of an incremental copy by default the SimpleCopyListing implementation is 
> used.  In it's implementation it iterates through the DiffReport and if the 
> DiffType is Create and the path is a directory, it recursively traverses the 
> directory and adds the subpaths to the resultant copyList.
> This works fine for implementations of snapshotDiff that include only 
> top-level directories as part of its DiffReport . Suppose a snapshotDiff 
> implementation outputs only flat paths that include both the directory and 
> sub-directory subpath in its DiffReport, it will lead to duplicate paths in 
> the copyList and throws DuplicateFileException.
>  
> For example
> Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all 
> subpaths as part of the diff.
> {code:java}
> [~]# ozone sh snapshot create vol11/buck1 snap1
> [~]# ozone sh snapshot create vol11/buck2 snap1
> [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111
> [~]# ozone sh snapshot create vol11/buck1 snap2 
> [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2
> Difference between snapshot: snap1 and snapshot: snap2
> +    ./dir1
> +    ./dir1/dir11
> +    ./dir1/dir11/dir111  {code}
> we can see even though dir11 & dir111 are subpaths they are present in 
> snapdiff , This is not the case for HDFS though.
> This Jira aims to create a new copyListing impl that is used for diff based 
> copyListing that doesn't traverse the directory but only adds paths that are 
> present in its diff.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747295#comment-17747295
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

sadanand48 commented on code in PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#discussion_r1274428899


##
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java:
##
@@ -316,6 +303,42 @@ protected void doBuildListingWithSnapshotDiff(
 }
   }
 
+  /**
+   * Handle create Diffs and add to the copyList.
+   * If the path is a directory, iterate it recursively and add the paths
+   * to the result copyList.
+   *
+   * @param fileListWriter the list for holding processed results
+   * @param context The DistCp context with associated input options
+   * @param sourceRoot The rootDir of the source snapshot
+   * @param sourceFS the source Filesystem
+   * @param fileStatuses store the result fileStatuses to add to the copyList
+   * @param diff the SnapshotDiff report
+   * @throws IOException
+   */
+  protected void addCreateDiffsToFileListing(SequenceFile.Writer 
fileListWriter,
+  DistCpContext context, Path sourceRoot, FileSystem sourceFS,
+  List fileStatuses, DiffInfo diff) throws IOException {
+addToFileListing(fileListWriter, sourceRoot, diff.getTarget(), context);
+
+FileStatus sourceStatus = sourceFS.getFileStatus(diff.getTarget());

Review Comment:
   Yes, I didn't do so to change existing behaviour of SimpleCopyListing.





> Support snapshot diff based copylisting for flat paths.
> ---
>
> Key: HDFS-17120
> URL: https://issues.apache.org/jira/browse/HDFS-17120
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sadanand Shenoy
>Assignee: Sadanand Shenoy
>Priority: Major
>  Labels: pull-request-available
>
> Currently for Diff-based copyListing that is used during the distcpSync step 
> of an incremental copy by default the SimpleCopyListing implementation is 
> used.  In it's implementation it iterates through the DiffReport and if the 
> DiffType is Create and the path is a directory, it recursively traverses the 
> directory and adds the subpaths to the resultant copyList.
> This works fine for implementations of snapshotDiff that include only 
> top-level directories as part of its DiffReport . Suppose a snapshotDiff 
> implementation outputs only flat paths that include both the directory and 
> sub-directory subpath in its DiffReport, it will lead to duplicate paths in 
> the copyList and throws DuplicateFileException.
>  
> For example
> Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all 
> subpaths as part of the diff.
> {code:java}
> [~]# ozone sh snapshot create vol11/buck1 snap1
> [~]# ozone sh snapshot create vol11/buck2 snap1
> [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111
> [~]# ozone sh snapshot create vol11/buck1 snap2 
> [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2
> Difference between snapshot: snap1 and snapshot: snap2
> +    ./dir1
> +    ./dir1/dir11
> +    ./dir1/dir11/dir111  {code}
> we can see even though dir11 & dir111 are subpaths they are present in 
> snapdiff , This is not the case for HDFS though.
> This Jira aims to create a new copyListing impl that is used for diff based 
> copyListing that doesn't traverse the directory but only adds paths that are 
> present in its diff.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747293#comment-17747293
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

sadanand48 commented on code in PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#discussion_r1274428340


##
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java:
##
@@ -305,6 +305,33 @@ public static CopyListing getCopyListing(Configuration 
configuration,
 }
   }
 
+  /**
+   * Public Factory method with which the appropriate Diff CopyListing 
implementation may be retrieved.
+   * @param configuration The input configuration.
+   * @param credentials Credentials object on which the FS delegation tokens 
are cached
+   * @param distCpSync DistcpSync object used to sync diffs between source and 
target.
+   * @return An instance of the appropriate CopyListing implementation.

Review Comment:
   Done.





> Support snapshot diff based copylisting for flat paths.
> ---
>
> Key: HDFS-17120
> URL: https://issues.apache.org/jira/browse/HDFS-17120
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sadanand Shenoy
>Assignee: Sadanand Shenoy
>Priority: Major
>  Labels: pull-request-available
>
> Currently for Diff-based copyListing that is used during the distcpSync step 
> of an incremental copy by default the SimpleCopyListing implementation is 
> used.  In it's implementation it iterates through the DiffReport and if the 
> DiffType is Create and the path is a directory, it recursively traverses the 
> directory and adds the subpaths to the resultant copyList.
> This works fine for implementations of snapshotDiff that include only 
> top-level directories as part of its DiffReport . Suppose a snapshotDiff 
> implementation outputs only flat paths that include both the directory and 
> sub-directory subpath in its DiffReport, it will lead to duplicate paths in 
> the copyList and throws DuplicateFileException.
>  
> For example
> Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all 
> subpaths as part of the diff.
> {code:java}
> [~]# ozone sh snapshot create vol11/buck1 snap1
> [~]# ozone sh snapshot create vol11/buck2 snap1
> [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111
> [~]# ozone sh snapshot create vol11/buck1 snap2 
> [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2
> Difference between snapshot: snap1 and snapshot: snap2
> +    ./dir1
> +    ./dir1/dir11
> +    ./dir1/dir11/dir111  {code}
> we can see even though dir11 & dir111 are subpaths they are present in 
> snapdiff , This is not the case for HDFS though.
> This Jira aims to create a new copyListing impl that is used for diff based 
> copyListing that doesn't traverse the directory but only adds paths that are 
> present in its diff.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747236#comment-17747236
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

umamaheswararao commented on code in PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#discussion_r1274301605


##
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java:
##
@@ -316,6 +303,42 @@ protected void doBuildListingWithSnapshotDiff(
 }
   }
 
+  /**
+   * Handle create Diffs and add to the copyList.
+   * If the path is a directory, iterate it recursively and add the paths
+   * to the result copyList.
+   *
+   * @param fileListWriter the list for holding processed results
+   * @param context The DistCp context with associated input options
+   * @param sourceRoot The rootDir of the source snapshot
+   * @param sourceFS the source Filesystem
+   * @param fileStatuses store the result fileStatuses to add to the copyList
+   * @param diff the SnapshotDiff report
+   * @throws IOException
+   */
+  protected void addCreateDiffsToFileListing(SequenceFile.Writer 
fileListWriter,
+  DistCpContext context, Path sourceRoot, FileSystem sourceFS,
+  List fileStatuses, DiffInfo diff) throws IOException {
+addToFileListing(fileListWriter, sourceRoot, diff.getTarget(), context);
+
+FileStatus sourceStatus = sourceFS.getFileStatus(diff.getTarget());

Review Comment:
   Have you thought about just having a advanced flag to control this? I am not 
sure we will be having many implementations of these copyListings 
   I am not against the current design, but it's a simple thought to keep it 
simple



##
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java:
##
@@ -167,6 +169,77 @@ public void testDuplicates() {
 }
   }
 
+  @Test(expected = DuplicateFileException.class, timeout = 1)
+  public void testDiffBasedSimpleCopyListing() throws IOException {
+FileSystem fs = null;
+Configuration configuration = getConf();
+DistCpSync distCpSync = Mockito.mock(DistCpSync.class);
+Path listingFile = new Path("/tmp/list");
+// Throws DuplicateFileException when copyListing is SimpleCopyListing
+// as it recursively traverses src3 directory and also adds 3.txt,4.txt 
twice
+configuration.set(DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_CLASS,
+SimpleCopyListing.class.getName());

Review Comment:
   How this config should be configured when we want to copy from HDFS to Ozone 
or vise versa ?





> Support snapshot diff based copylisting for flat paths.
> ---
>
> Key: HDFS-17120
> URL: https://issues.apache.org/jira/browse/HDFS-17120
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sadanand Shenoy
>Assignee: Sadanand Shenoy
>Priority: Major
>  Labels: pull-request-available
>
> Currently for Diff-based copyListing that is used during the distcpSync step 
> of an incremental copy by default the SimpleCopyListing implementation is 
> used.  In it's implementation it iterates through the DiffReport and if the 
> DiffType is Create and the path is a directory, it recursively traverses the 
> directory and adds the subpaths to the resultant copyList.
> This works fine for implementations of snapshotDiff that include only 
> top-level directories as part of its DiffReport . Suppose a snapshotDiff 
> implementation outputs only flat paths that include both the directory and 
> sub-directory subpath in its DiffReport, it will lead to duplicate paths in 
> the copyList and throws DuplicateFileException.
>  
> For example
> Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all 
> subpaths as part of the diff.
> {code:java}
> [~]# ozone sh snapshot create vol11/buck1 snap1
> [~]# ozone sh snapshot create vol11/buck2 snap1
> [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111
> [~]# ozone sh snapshot create vol11/buck1 snap2 
> [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2
> Difference between snapshot: snap1 and snapshot: snap2
> +    ./dir1
> +    ./dir1/dir11
> +    ./dir1/dir11/dir111  {code}
> we can see even though dir11 & dir111 are subpaths they are present in 
> snapdiff , This is not the case for HDFS though.
> This Jira aims to create a new copyListing impl that is used for diff based 
> copyListing that doesn't traverse the directory but only adds paths that are 
> present in its diff.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747234#comment-17747234
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

umamaheswararao commented on code in PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#discussion_r1274292957


##
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java:
##
@@ -305,6 +305,33 @@ public static CopyListing getCopyListing(Configuration 
configuration,
 }
   }
 
+  /**
+   * Public Factory method with which the appropriate Diff CopyListing 
implementation may be retrieved.
+   * @param configuration The input configuration.
+   * @param credentials Credentials object on which the FS delegation tokens 
are cached
+   * @param distCpSync DistcpSync object used to sync diffs between source and 
target.
+   * @return An instance of the appropriate CopyListing implementation.

Review Comment:
   Please remove fully qualified package here? I don't think we have other 
customer IOEception class.





> Support snapshot diff based copylisting for flat paths.
> ---
>
> Key: HDFS-17120
> URL: https://issues.apache.org/jira/browse/HDFS-17120
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sadanand Shenoy
>Assignee: Sadanand Shenoy
>Priority: Major
>  Labels: pull-request-available
>
> Currently for Diff-based copyListing that is used during the distcpSync step 
> of an incremental copy by default the SimpleCopyListing implementation is 
> used.  In it's implementation it iterates through the DiffReport and if the 
> DiffType is Create and the path is a directory, it recursively traverses the 
> directory and adds the subpaths to the resultant copyList.
> This works fine for implementations of snapshotDiff that include only 
> top-level directories as part of its DiffReport . Suppose a snapshotDiff 
> implementation outputs only flat paths that include both the directory and 
> sub-directory subpath in its DiffReport, it will lead to duplicate paths in 
> the copyList and throws DuplicateFileException.
>  
> For example
> Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all 
> subpaths as part of the diff.
> {code:java}
> [~]# ozone sh snapshot create vol11/buck1 snap1
> [~]# ozone sh snapshot create vol11/buck2 snap1
> [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111
> [~]# ozone sh snapshot create vol11/buck1 snap2 
> [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2
> Difference between snapshot: snap1 and snapshot: snap2
> +    ./dir1
> +    ./dir1/dir11
> +    ./dir1/dir11/dir111  {code}
> we can see even though dir11 & dir111 are subpaths they are present in 
> snapdiff , This is not the case for HDFS though.
> This Jira aims to create a new copyListing impl that is used for diff based 
> copyListing that doesn't traverse the directory but only adds paths that are 
> present in its diff.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747233#comment-17747233
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

umamaheswararao commented on code in PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#discussion_r1274291510


##
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java:
##
@@ -305,6 +305,33 @@ public static CopyListing getCopyListing(Configuration 
configuration,
 }
   }
 
+  /**
+   * Public Factory method with which the appropriate Diff CopyListing 
implementation may be retrieved.
+   * @param configuration The input configuration.
+   * @param credentials Credentials object on which the FS delegation tokens 
are cached
+   * @param distCpSync DistcpSync object used to sync diffs between source and 
target.
+   * @return An instance of the appropriate CopyListing implementation.
+   * @throws java.io.IOException - Exception if any
+   */
+  public static CopyListing getDiffCopyListing(Configuration configuration,
+  Credentials credentials, DistCpSync distCpSync) throws IOException {
+String copyListingClassName =

Review Comment:
   Why are we assigning default from config here? you are always reassiging 
with copyListingClass.getName(); right?





> Support snapshot diff based copylisting for flat paths.
> ---
>
> Key: HDFS-17120
> URL: https://issues.apache.org/jira/browse/HDFS-17120
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sadanand Shenoy
>Assignee: Sadanand Shenoy
>Priority: Major
>  Labels: pull-request-available
>
> Currently for Diff-based copyListing that is used during the distcpSync step 
> of an incremental copy by default the SimpleCopyListing implementation is 
> used.  In it's implementation it iterates through the DiffReport and if the 
> DiffType is Create and the path is a directory, it recursively traverses the 
> directory and adds the subpaths to the resultant copyList.
> This works fine for implementations of snapshotDiff that include only 
> top-level directories as part of its DiffReport . Suppose a snapshotDiff 
> implementation outputs only flat paths that include both the directory and 
> sub-directory subpath in its DiffReport, it will lead to duplicate paths in 
> the copyList and throws DuplicateFileException.
>  
> For example
> Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all 
> subpaths as part of the diff.
> {code:java}
> [~]# ozone sh snapshot create vol11/buck1 snap1
> [~]# ozone sh snapshot create vol11/buck2 snap1
> [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111
> [~]# ozone sh snapshot create vol11/buck1 snap2 
> [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2
> Difference between snapshot: snap1 and snapshot: snap2
> +    ./dir1
> +    ./dir1/dir11
> +    ./dir1/dir11/dir111  {code}
> we can see even though dir11 & dir111 are subpaths they are present in 
> snapdiff , This is not the case for HDFS though.
> This Jira aims to create a new copyListing impl that is used for diff based 
> copyListing that doesn't traverse the directory but only adds paths that are 
> present in its diff.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747209#comment-17747209
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

hadoop-yetus commented on PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1650684670

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 38s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  45m 54s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 35s |  |  trunk passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |   0m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   0m 33s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 38s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  |  trunk passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   0m 58s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  33m 32s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 27s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 24s |  |  the patch passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |   0m 24s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 23s |  |  the patch passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |   0m 23s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 18s | 
[/results-checkstyle-hadoop-tools_hadoop-distcp.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/5/artifact/out/results-checkstyle-hadoop-tools_hadoop-distcp.txt)
 |  hadoop-tools/hadoop-distcp: The patch generated 1 new + 44 unchanged - 0 
fixed = 45 total (was 44)  |
   | +1 :green_heart: |  mvnsite  |   0m 26s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 23s |  |  
hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 with 
JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 generated 0 new + 39 unchanged 
- 2 fixed = 39 total (was 41)  |
   | +1 :green_heart: |  javadoc  |   0m 22s |  |  
hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09
 with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 generated 0 
new + 39 unchanged - 2 fixed = 39 total (was 41)  |
   | +1 :green_heart: |  spotbugs  |   0m 50s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  33m 33s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  15m  8s |  |  hadoop-distcp in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 42s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 141m 30s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5885 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 5cd8625b1112 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 
13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 70ef6eb851e72186768c8d284c93e201ff05637f |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   |  Test Results | 

[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747164#comment-17747164
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

sadanand48 commented on code in PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#discussion_r1274102593


##
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java:
##
@@ -305,6 +305,33 @@ public static CopyListing getCopyListing(Configuration 
configuration,
 }
   }
 
+  /**
+   * Public Factory method with which the appropriate Diff CopyListing 
implementation may be retrieved.
+   * @param configuration The input configuration.
+   * @param credentials Credentials object on which the FS delegation tokens 
are cached
+   * @param distCpSync DistcpSync object used to sync diffs between source and 
target.
+   * @return An instance of the appropriate CopyListing implementation.
+   * @throws java.io.IOException - Exception if any
+   */
+  public static CopyListing getDiffCopyListing(Configuration configuration,
+  Credentials credentials, DistCpSync distCpSync) throws IOException {
+String copyListingClassName =
+configuration.get(DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_CLASS,
+"");
+try {
+  Class copyListingClass =
+  
configuration.getClass(DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_CLASS,
+  SimpleCopyListing.class, SimpleCopyListing.class);
+  copyListingClassName = copyListingClass.getName();
+  Constructor constructor =
+  copyListingClass.getDeclaredConstructor(Configuration.class,
+  Credentials.class, DistCpSync.class);
+  return constructor.newInstance(configuration, credentials,distCpSync);

Review Comment:
   Done.



##
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java:
##
@@ -167,6 +169,66 @@ public void testDuplicates() {
 }
   }
 
+  @Test(timeout=1)
+  public void testFlatDiffCopyListing() {
+FileSystem fs = null;
+try {
+  fs = FileSystem.get(getConf());
+  List srcPaths = new ArrayList();
+  srcPaths.add(new Path("/tmp/in"));
+  TestDistCpUtils.createFile(fs, "/tmp/in/src1/1.txt");
+  TestDistCpUtils.createFile(fs, "/tmp/in/src2/1.txt");
+  TestDistCpUtils.createFile(fs, "/tmp/in/src3/3.txt");
+  TestDistCpUtils.createFile(fs, "/tmp/in/src3/4.txt");
+  Path target = new Path("/tmp/out");
+  Path listingFile = new Path("/tmp/list");
+  // adding below flags useDiff & sync only to enable 
context.shouldUseSnapshotDiff()
+  final DistCpOptions options = new DistCpOptions.Builder(srcPaths, target)
+  .withUseDiff("snap1","snap2")
+  .withSyncFolder(true)
+  .build();
+  final DistCpContext context = new DistCpContext(options);
+  Configuration configuration = getConf();
+  configuration.set(DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_CLASS,
+  FlatDiffCopyListing.class.getName());
+  DistCpSync distCpSync = Mockito.mock(DistCpSync.class);
+  // Create a dummy DiffInfo List that contains a directory + paths inside
+  // that directory as part of the diff.
+
+  ArrayList diffs = new ArrayList<>();
+  diffs.add(
+  new DiffInfo(new Path("/tmp/in/src3/"), new Path("/tmp/in/src3/"),
+  SnapshotDiffReport.DiffType.CREATE));
+  diffs.add(new DiffInfo(new Path("/tmp/in/src3/3.txt"),
+  new Path("/tmp/in/src3/3.txt"), SnapshotDiffReport.DiffType.CREATE));
+  diffs.add(new DiffInfo(new Path("/tmp/in/src3/4.txt"),
+  new Path("/tmp/in/src3/4.txt"), SnapshotDiffReport.DiffType.CREATE));
+  
Mockito.when(distCpSync.prepareDiffListForCopyListing()).thenReturn(diffs);
+
+  CopyListing listing =
+  CopyListing.getDiffCopyListing(configuration, 
CREDENTIALS,distCpSync);
+  // won't throw DuplicateFileException as copyListing is 
FlatDiffCopyListing.
+  listing.buildListing(listingFile, context);
+
+  // Throws DuplicateFileException when copyListing is SimpleCopyListing
+  // as it recursively traverses src3 directory and also adds 3.txt,4.txt 
twice
+  
configuration.set(DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_CLASS,SimpleCopyListing.class.getName());
+  try{
+listing =
+CopyListing.getDiffCopyListing(configuration, 
CREDENTIALS,distCpSync);

Review Comment:
   Done.



##
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java:
##
@@ -167,6 +169,66 @@ public void testDuplicates() {
 }
   }
 
+  @Test(timeout=1)
+  public void testFlatDiffCopyListing() {
+FileSystem fs = null;
+try {
+  fs = FileSystem.get(getConf());
+  List srcPaths = new ArrayList();
+  srcPaths.add(new Path("/tmp/in"));
+  TestDistCpUtils.createFile(fs, "/tmp/in/src1/1.txt");
+  

[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747052#comment-17747052
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

swamirishi commented on code in PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#discussion_r1273728074


##
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java:
##
@@ -305,6 +305,33 @@ public static CopyListing getCopyListing(Configuration 
configuration,
 }
   }
 
+  /**
+   * Public Factory method with which the appropriate Diff CopyListing 
implementation may be retrieved.
+   * @param configuration The input configuration.
+   * @param credentials Credentials object on which the FS delegation tokens 
are cached
+   * @param distCpSync DistcpSync object used to sync diffs between source and 
target.
+   * @return An instance of the appropriate CopyListing implementation.
+   * @throws java.io.IOException - Exception if any
+   */
+  public static CopyListing getDiffCopyListing(Configuration configuration,
+  Credentials credentials, DistCpSync distCpSync) throws IOException {
+String copyListingClassName =
+configuration.get(DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_CLASS,
+"");
+try {
+  Class copyListingClass =
+  
configuration.getClass(DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_CLASS,
+  SimpleCopyListing.class, SimpleCopyListing.class);
+  copyListingClassName = copyListingClass.getName();
+  Constructor constructor =
+  copyListingClass.getDeclaredConstructor(Configuration.class,
+  Credentials.class, DistCpSync.class);
+  return constructor.newInstance(configuration, credentials,distCpSync);

Review Comment:
   ```suggestion
 return constructor.newInstance(configuration, credentials, distCpSync);
   ```



##
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java:
##
@@ -167,6 +169,66 @@ public void testDuplicates() {
 }
   }
 
+  @Test(timeout=1)
+  public void testFlatDiffCopyListing() {
+FileSystem fs = null;
+try {
+  fs = FileSystem.get(getConf());
+  List srcPaths = new ArrayList();
+  srcPaths.add(new Path("/tmp/in"));
+  TestDistCpUtils.createFile(fs, "/tmp/in/src1/1.txt");
+  TestDistCpUtils.createFile(fs, "/tmp/in/src2/1.txt");
+  TestDistCpUtils.createFile(fs, "/tmp/in/src3/3.txt");
+  TestDistCpUtils.createFile(fs, "/tmp/in/src3/4.txt");
+  Path target = new Path("/tmp/out");
+  Path listingFile = new Path("/tmp/list");
+  // adding below flags useDiff & sync only to enable 
context.shouldUseSnapshotDiff()
+  final DistCpOptions options = new DistCpOptions.Builder(srcPaths, target)
+  .withUseDiff("snap1","snap2")
+  .withSyncFolder(true)
+  .build();
+  final DistCpContext context = new DistCpContext(options);
+  Configuration configuration = getConf();
+  configuration.set(DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_CLASS,
+  FlatDiffCopyListing.class.getName());
+  DistCpSync distCpSync = Mockito.mock(DistCpSync.class);
+  // Create a dummy DiffInfo List that contains a directory + paths inside
+  // that directory as part of the diff.
+
+  ArrayList diffs = new ArrayList<>();
+  diffs.add(
+  new DiffInfo(new Path("/tmp/in/src3/"), new Path("/tmp/in/src3/"),
+  SnapshotDiffReport.DiffType.CREATE));
+  diffs.add(new DiffInfo(new Path("/tmp/in/src3/3.txt"),
+  new Path("/tmp/in/src3/3.txt"), SnapshotDiffReport.DiffType.CREATE));
+  diffs.add(new DiffInfo(new Path("/tmp/in/src3/4.txt"),
+  new Path("/tmp/in/src3/4.txt"), SnapshotDiffReport.DiffType.CREATE));
+  
Mockito.when(distCpSync.prepareDiffListForCopyListing()).thenReturn(diffs);
+
+  CopyListing listing =
+  CopyListing.getDiffCopyListing(configuration, 
CREDENTIALS,distCpSync);
+  // won't throw DuplicateFileException as copyListing is 
FlatDiffCopyListing.
+  listing.buildListing(listingFile, context);
+
+  // Throws DuplicateFileException when copyListing is SimpleCopyListing
+  // as it recursively traverses src3 directory and also adds 3.txt,4.txt 
twice
+  
configuration.set(DistCpConstants.CONF_LABEL_DIFF_COPY_LISTING_CLASS,SimpleCopyListing.class.getName());
+  try{
+listing =
+CopyListing.getDiffCopyListing(configuration, 
CREDENTIALS,distCpSync);

Review Comment:
   ```suggestion
   CopyListing.getDiffCopyListing(configuration, CREDENTIALS, 
distCpSync);
   ```



##
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java:
##
@@ -167,6 +169,66 @@ public void testDuplicates() {
 }
   }
 
+  @Test(timeout=1)
+  public void testFlatDiffCopyListing() {
+FileSystem fs = null;
+try {

[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17746999#comment-17746999
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

hadoop-yetus commented on PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1649840700

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 30s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  33m 22s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 26s |  |  trunk passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |   0m 24s |  |  trunk passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   0m 24s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 28s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 24s |  |  trunk passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   0m 41s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m  0s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 19s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |   0m 18s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 16s |  |  the patch passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |   0m 16s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 13s | 
[/results-checkstyle-hadoop-tools_hadoop-distcp.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/4/artifact/out/results-checkstyle-hadoop-tools_hadoop-distcp.txt)
 |  hadoop-tools/hadoop-distcp: The patch generated 7 new + 44 unchanged - 0 
fixed = 51 total (was 44)  |
   | +1 :green_heart: |  mvnsite  |   0m 18s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 17s |  |  
hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 with 
JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 generated 0 new + 39 unchanged 
- 2 fixed = 39 total (was 41)  |
   | +1 :green_heart: |  javadoc  |   0m 16s |  |  
hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09
 with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 generated 0 
new + 39 unchanged - 2 fixed = 39 total (was 41)  |
   | +1 :green_heart: |  spotbugs  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 42s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  13m 20s |  |  hadoop-distcp in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 29s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  98m  4s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5885 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux ee6b33cd79ff 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 
13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 47e79e6ddd212e193c69fb0e09e43fee3a9d8dd1 |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   |  Test Results | 

[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17746929#comment-17746929
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

hadoop-yetus commented on PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1649663453

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 37s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  48m 50s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 34s |  |  trunk passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |   0m 32s |  |  trunk passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   0m 33s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 38s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  |  trunk passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   0m 55s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  33m 59s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 25s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 23s |  |  the patch passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |   0m 23s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 21s |  |  the patch passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |   0m 21s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 19s | 
[/results-checkstyle-hadoop-tools_hadoop-distcp.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/3/artifact/out/results-checkstyle-hadoop-tools_hadoop-distcp.txt)
 |  hadoop-tools/hadoop-distcp: The patch generated 7 new + 44 unchanged - 0 
fixed = 51 total (was 44)  |
   | +1 :green_heart: |  mvnsite  |   0m 25s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   0m 22s | 
[/results-javadoc-javadoc-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/3/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt)
 |  hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 
with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 generated 1 new + 39 
unchanged - 2 fixed = 40 total (was 41)  |
   | +1 :green_heart: |  javadoc  |   0m 22s |  |  
hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09
 with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 generated 0 
new + 40 unchanged - 1 fixed = 40 total (was 41)  |
   | +1 :green_heart: |  spotbugs  |   0m 50s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  33m 44s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  15m  2s |  |  hadoop-distcp in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 42s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 144m 51s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5885 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux f1e2dd4127b0 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 
13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / b33ca7e27eade12523d3f6169b43ffc6f914accf |
   | Default Java | Private 

[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17746826#comment-17746826
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

hadoop-yetus commented on PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1649328587

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 27s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  33m 41s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 25s |  |  trunk passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |   0m 23s |  |  trunk passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   0m 24s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 27s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 28s |  |  trunk passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 24s |  |  trunk passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   0m 41s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 59s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 19s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |   0m 18s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 16s |  |  the patch passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |   0m 16s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 13s | 
[/results-checkstyle-hadoop-tools_hadoop-distcp.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/2/artifact/out/results-checkstyle-hadoop-tools_hadoop-distcp.txt)
 |  hadoop-tools/hadoop-distcp: The patch generated 7 new + 42 unchanged - 0 
fixed = 49 total (was 42)  |
   | +1 :green_heart: |  mvnsite  |   0m 18s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   0m 17s | 
[/results-javadoc-javadoc-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/2/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt)
 |  hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 
with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 generated 8 new + 40 
unchanged - 1 fixed = 48 total (was 41)  |
   | -1 :x: |  javadoc  |   0m 16s | 
[/results-javadoc-javadoc-hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/2/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.txt)
 |  
hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09
 with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 generated 8 
new + 40 unchanged - 1 fixed = 48 total (was 41)  |
   | +1 :green_heart: |  spotbugs  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 54s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  13m 15s |  |  hadoop-distcp in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 29s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  98m 29s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5885 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname 

[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17746795#comment-17746795
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

sadanand48 commented on PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1649205250

   @ayushtkn @steveloughran @umamaheswararao Can I please get a review?




> Support snapshot diff based copylisting for flat paths.
> ---
>
> Key: HDFS-17120
> URL: https://issues.apache.org/jira/browse/HDFS-17120
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sadanand Shenoy
>Assignee: Sadanand Shenoy
>Priority: Major
>  Labels: pull-request-available
>
> Currently for Diff-based copyListing that is used during the distcpSync step 
> of an incremental copy by default the SimpleCopyListing implementation is 
> used.  In it's implementation it iterates through the DiffReport and if the 
> DiffType is Create and the path is a directory, it recursively traverses the 
> directory and adds the subpaths to the resultant copyList.
> This works fine for implementations of snapshotDiff that include only 
> top-level directories as part of its DiffReport . Suppose a snapshotDiff 
> implementation outputs only flat paths that include both the directory and 
> sub-directory subpath in its DiffReport, it will lead to duplicate paths in 
> the copyList and throws DuplicateFileException.
>  
> For example
> Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all 
> subpaths as part of the diff.
> {code:java}
> [~]# ozone sh snapshot create vol11/buck1 snap1
> [~]# ozone sh snapshot create vol11/buck2 snap1
> [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111
> [~]# ozone sh snapshot create vol11/buck1 snap2 
> [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2
> Difference between snapshot: snap1 and snapshot: snap2
> +    ./dir1
> +    ./dir1/dir11
> +    ./dir1/dir11/dir111  {code}
> we can see even though dir11 & dir111 are subpaths they are present in 
> snapdiff , This is not the case for HDFS though.
> This Jira aims to create a new copyListing impl that is used for diff based 
> copyListing that doesn't traverse the directory but only adds paths that are 
> present in its diff.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17746713#comment-17746713
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

prashantpogde commented on PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1648796679

   @umamaheswararao Can you please take a look ?




> Support snapshot diff based copylisting for flat paths.
> ---
>
> Key: HDFS-17120
> URL: https://issues.apache.org/jira/browse/HDFS-17120
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sadanand Shenoy
>Assignee: Sadanand Shenoy
>Priority: Major
>  Labels: pull-request-available
>
> Currently for Diff-based copyListing that is used during the distcpSync step 
> of an incremental copy by default the SimpleCopyListing implementation is 
> used.  In it's implementation it iterates through the DiffReport and if the 
> DiffType is Create and the path is a directory, it recursively traverses the 
> directory and adds the subpaths to the resultant copyList.
> This works fine for implementations of snapshotDiff that include only 
> top-level directories as part of its DiffReport . Suppose a snapshotDiff 
> implementation outputs only flat paths that include both the directory and 
> sub-directory subpath in its DiffReport, it will lead to duplicate paths in 
> the copyList and throws DuplicateFileException.
>  
> For example
> Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all 
> subpaths as part of the diff.
> {code:java}
> [~]# ozone sh snapshot create vol11/buck1 snap1
> [~]# ozone sh snapshot create vol11/buck2 snap1
> [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111
> [~]# ozone sh snapshot create vol11/buck1 snap2 
> [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2
> Difference between snapshot: snap1 and snapshot: snap2
> +    ./dir1
> +    ./dir1/dir11
> +    ./dir1/dir11/dir111  {code}
> we can see even though dir11 & dir111 are subpaths they are present in 
> snapdiff , This is not the case for HDFS though.
> This Jira aims to create a new copyListing impl that is used for diff based 
> copyListing that doesn't traverse the directory but only adds paths that are 
> present in its diff.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17746645#comment-17746645
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

hadoop-yetus commented on PR #5885:
URL: https://github.com/apache/hadoop/pull/5885#issuecomment-1648595933

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 30s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  35m 15s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 25s |  |  trunk passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |   0m 23s |  |  trunk passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   0m 24s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 27s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 28s |  |  trunk passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 25s |  |  trunk passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   0m 42s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 53s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 18s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 17s |  |  the patch passed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |   0m 17s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 16s |  |  the patch passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |   0m 16s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 13s | 
[/results-checkstyle-hadoop-tools_hadoop-distcp.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/1/artifact/out/results-checkstyle-hadoop-tools_hadoop-distcp.txt)
 |  hadoop-tools/hadoop-distcp: The patch generated 7 new + 42 unchanged - 0 
fixed = 49 total (was 42)  |
   | +1 :green_heart: |  mvnsite  |   0m 18s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   0m 17s | 
[/patch-javadoc-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/1/artifact/out/patch-javadoc-hadoop-tools_hadoop-distcp-jdkUbuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.txt)
 |  hadoop-distcp in the patch failed with JDK 
Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1.  |
   | -1 :x: |  javadoc  |   0m 16s | 
[/patch-javadoc-hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/1/artifact/out/patch-javadoc-hadoop-tools_hadoop-distcp-jdkPrivateBuild-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.txt)
 |  hadoop-distcp in the patch failed with JDK Private 
Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09.  |
   | +1 :green_heart: |  spotbugs  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  24m 48s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  14m 14s |  |  hadoop-distcp in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 29s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 105m 52s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5885/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5885 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 8407f2688740 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 
13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 

[jira] [Commented] (HDFS-17120) Support snapshot diff based copylisting for flat paths.

2023-07-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17746608#comment-17746608
 ] 

ASF GitHub Bot commented on HDFS-17120:
---

sadanand48 opened a new pull request, #5885:
URL: https://github.com/apache/hadoop/pull/5885

   ### Description of PR
   Currently for Diff-based copyListing that is used during the distcpSync step 
of an incremental copy by default the SimpleCopyListing implementation is used. 
 In it's implementation it iterates through the DiffReport and if the DiffType 
is Create and the path is a directory, it recursively traverses the directory 
and adds the subpaths to the resultant copyList.
   
   This PR adds a copyListing implementation which only considers flat paths in 
snapshotDiff report & doesn't traverse directories recursively.
   There is no impact to existing behaviour as the default copyListing impl for 
diff based copy is SimpleCopyListing but can be overridden if desired using a 
config.
   
   https://issues.apache.org/jira/browse/HDFS-17120
   
   ### How was this patch tested?
   ### For code changes:
   Added Unit tests
   
   




> Support snapshot diff based copylisting for flat paths.
> ---
>
> Key: HDFS-17120
> URL: https://issues.apache.org/jira/browse/HDFS-17120
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sadanand Shenoy
>Assignee: Sadanand Shenoy
>Priority: Major
>
> Currently for Diff-based copyListing that is used during the distcpSync step 
> of an incremental copy by default the SimpleCopyListing implementation is 
> used.  In it's implementation it iterates through the DiffReport and if the 
> DiffType is Create and the path is a directory, it recursively traverses the 
> directory and adds the subpaths to the resultant copyList.
> This works fine for implementations of snapshotDiff that include only 
> top-level directories as part of its DiffReport . Suppose a snapshotDiff 
> implementation outputs only flat paths that include both the directory and 
> sub-directory subpath in its DiffReport, it will lead to duplicate paths in 
> the copyList and throws DuplicateFileException.
>  
> For example
> Ozone filesystem implementation of snapdiff b/w 2 snapshots shows all 
> subpaths as part of the diff.
> {code:java}
> [~]# ozone sh snapshot create vol11/buck1 snap1
> [~]# ozone sh snapshot create vol11/buck2 snap1
> [~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11
> [ ~]# ozone fs -mkdir ofs://ozone1/vol11/buck1/dir1/dir11/dir111
> [~]# ozone sh snapshot create vol11/buck1 snap2 
> [~]# ozone sh snapshot diff vol11/buck1 snap1 snap2
> Difference between snapshot: snap1 and snapshot: snap2
> +    ./dir1
> +    ./dir1/dir11
> +    ./dir1/dir11/dir111  {code}
> we can see even though dir11 & dir111 are subpaths they are present in 
> snapdiff , This is not the case for HDFS though.
> This Jira aims to create a new copyListing impl that is used for diff based 
> copyListing that doesn't traverse the directory but only adds paths that are 
> present in its diff.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org