[ https://issues.apache.org/jira/browse/HDFS-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138428#comment-17138428 ]
Stephen O'Donnell commented on HDFS-15406: ------------------------------------------ I looked into the test failures and the reason they are failing is very subtle, and is down to how File#toURI works. Quoting from the Java doc: {code} * The exact form of the URI is system-dependent. If it can be * determined that the file denoted by this abstract pathname is a * directory, then the resulting URI will end with a slash. {code} So it checks if a file is a directory or not, probably by checking for existence. If it it exists, the URI ends with a slash. If it does not exist, it does not end with a slash. The tests that are failing are Volume Failure tests, and the volumes are failed in the tests by renaming the directory to something else. That means that before this change, getBaseURI() returns: * "/path/to/dir/" - when the dir exists * "/path/to/dir" - when the dir has been failed. After this change, because we cache it, it always returns "/path/to/dir/". The test then compares "/path/to/dir/" with /path/to/dir in DatanodeTestUtils#getVolume(), which is ultimately why the tests fail. This change seems to fix the tests: {code} --- a/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java +++ b/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java @@ -239,7 +239,7 @@ public static FsVolumeImpl getVolume(DataNode dn, File basePath) throws try (FsDatasetSpi.FsVolumeReferences volumes = dn.getFSDataset() .getFsVolumeReferences()) { for (FsVolumeSpi vol : volumes) { - if (vol.getBaseURI().equals(basePath.toURI())) { + if (new File(vol.getBaseURI()).equals(basePath)) { return (FsVolumeImpl) vol; } } {code} This same issue is the reason behind TestDataNodeVolumeFailureReporting.testHotSwapOutFailedVolumeAndReporting too. [~hemanthboyina] could you update the patch and see if the tests look better after this change? > Improve the speed of Datanode Block Scan > ---------------------------------------- > > Key: HDFS-15406 > URL: https://issues.apache.org/jira/browse/HDFS-15406 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: hemanthboyina > Assignee: hemanthboyina > Priority: Major > Attachments: HDFS-15406.001.patch > > > In our customer cluster we have approx 10M blocks in one datanode > the Datanode to scans all the blocks , it has taken nearly 5mins > {code:java} > 2020-06-10 12:17:06,869 | INFO | > java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty > queue] | BlockPool BP-1104115233-**.**.**.**-1571300215588 Total blocks: > 11149530, missing metadata files:472, missing block files:472, missing blocks > in memory:0, mismatched blocks:0 | DirectoryScanner.java:473 > 2020-06-10 12:17:06,869 | WARN | > java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty > queue] | Lock held time above threshold: lock identifier: > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl > lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: > java.lang.Thread.getStackTrace(Thread.java:1559) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032) > org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148) > org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186) > org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133) > org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84) > org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96) > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475) > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375) > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320) > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > java.lang.Thread.run(Thread.java:748) > | InstrumentedLock.java:143 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org