[ https://issues.apache.org/jira/browse/HADOOP-18117?focusedWorklogId=726239&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-726239 ]
ASF GitHub Bot logged work on HADOOP-18117: ------------------------------------------- Author: ASF GitHub Bot Created on: 14/Feb/22 11:31 Start Date: 14/Feb/22 11:31 Worklog Time Spent: 10m Work Description: ferhui commented on a change in pull request #3970: URL: https://github.com/apache/hadoop/pull/3970#discussion_r805750088 ########## File path: hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSystem.java ########## @@ -551,4 +554,41 @@ public void testSourceRoot() throws Exception { String[] args2 = new String[]{rootStr, tgtStr2}; Assert.assertThat(ToolRunner.run(conf, new DistCp(), args2), is(0)); } + + @Test + public void testUpdateRootDirectoryAttributes() throws Exception { + FileSystem fs = cluster.getFileSystem(); + + Path source = new Path("/src"); + Path dest1 = new Path("/dest1"); + Path dest2 = new Path("/dest2"); + + fs.delete(source, true); + fs.delete(dest1, true); + fs.delete(dest2, true); + + // Create a source dir + fs.mkdirs(source); + fs.setOwner(source, "userA", "groupA"); + fs.setTimes(source, new Random().nextLong(), new Random().nextLong()); + + GenericTestUtils.createFiles(fs, source, 3, 5, 5); + + // should not preserve attrs + DistCpTestUtils.assertRunDistCp(DistCpConstants.SUCCESS, source.toString(), + dest1.toString(), "-p -update", conf); + + FileStatus srcStatus = fs.getFileStatus(source); + FileStatus destStatus1 = fs.getFileStatus(dest1); + assertNotEquals(destStatus1.getOwner(), srcStatus.getOwner()); + assertNotEquals(destStatus1.getModificationTime(), srcStatus.getModificationTime()); + + // should preserve attrs + DistCpTestUtils.assertRunDistCp(DistCpConstants.SUCCESS, source.toString(), + dest2.toString(), "-p -update -updateRootDirectoryAttributes", conf); + + FileStatus destStatus2 = fs.getFileStatus(dest2); + assertEquals(destStatus2.getOwner(), srcStatus.getOwner()); Review comment: Should exchange the expected actual objects in assertEquals. srcStatus.getOwner() is the expected one, destStatus2.getOwner() is the actual one. ########## File path: hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm ########## @@ -363,6 +363,7 @@ Command Line Options | `-xtrack <path>` | Save information about missing source files to the specified path. | This option is only valid with `-update` option. This is an experimental property and it cannot be used with `-atomic` option. | | `-direct` | Write directly to destination paths | Useful for avoiding potentially very expensive temporary file rename operations when the destination is an object store | | `-useiterator` | Uses single threaded listStatusIterator to build listing | Useful for saving memory at the client side. Using this option will ignore the numListstatusThreads option | +| `-updateRootDirectoryAttributes` | Update root directory attributes (eg permissions, ownership ...) | Useful if you need to enforce root directory attributes update when using distcp | Review comment: Could you replace updateRoot instead of updateRootDirectoryAttributes or give another short name? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 726239) Time Spent: 1h 20m (was: 1h 10m) > Add an option to preserve root directory permissions > ---------------------------------------------------- > > Key: HADOOP-18117 > URL: https://issues.apache.org/jira/browse/HADOOP-18117 > Project: Hadoop Common > Issue Type: Improvement > Reporter: Mohanad Elsafty > Assignee: Mohanad Elsafty > Priority: Minor > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > As mentioned in https://issues.apache.org/jira/browse/HADOOP-15211 > > If *-update* or *-overwrite* is being passed when *distcp* used, the root > directory will be skipped in two occasions (CopyListing#doBuildListing & > CopyCommitter#preserveFileAttributesForDirectories), which will ignore root > directory's attributes. > > We face the same issue when distcp huge data between clusters and it takes > too much effort to update root directories attributes manually. > > From the earlier ticket it's obvious why this behaviour is there, but > sometime we need to enforce root directory update hence I will add a new > option for distcp to enable someone (who understands the need of this and > know what they are doing) to enforce the update of root directory's > attributes (permissions, ownership, ...) > > It should be simple one, something like this > {code:java} > $ hadoop distcp -p -update -updateRootDirectoryAttributes /a/b/c /a/b/d {code} > This behaviour is optional and will be *false* by default. (it should not > affect existing *distcp* users). -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org