[jira] [Commented] (HDFS-15406) Improve the speed of Datanode Block Scan

Stephen O'Donnell (Jira) Wed, 17 Jun 2020 06:28:18 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138428#comment-17138428
 ]


Stephen O'Donnell commented on HDFS-15406:
------------------------------------------

I looked into the test failures and the reason they are failing is very subtle, 
and is down to how File#toURI works.

Quoting from the Java doc:

{code}
     * The exact form of the URI is system-dependent.  If it can be
     * determined that the file denoted by this abstract pathname is a
     * directory, then the resulting URI will end with a slash.
{code}

So it checks if a file is a directory or not, probably by checking for 
existence. If it it exists, the URI ends with a slash. If it does not exist, it 
does not end with a slash.

The tests that are failing are Volume Failure tests, and the volumes are failed 
in the tests by renaming the directory to something else.

That means that before this change, getBaseURI() returns:

 * "/path/to/dir/" - when the dir exists
 * "/path/to/dir" - when the dir has been failed.

After this change, because we cache it, it always returns "/path/to/dir/".

The test then compares "/path/to/dir/" with /path/to/dir in 
DatanodeTestUtils#getVolume(), which is ultimately why the tests fail. 

This change seems to fix the tests:

{code}
--- 
a/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java
+++ 
b/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java
@@ -239,7 +239,7 @@ public static FsVolumeImpl getVolume(DataNode dn, File 
basePath) throws
     try (FsDatasetSpi.FsVolumeReferences volumes = dn.getFSDataset()
         .getFsVolumeReferences()) {
       for (FsVolumeSpi vol : volumes) {
-        if (vol.getBaseURI().equals(basePath.toURI())) {
+        if (new File(vol.getBaseURI()).equals(basePath)) {
           return (FsVolumeImpl) vol;
         }
       }
{code}

This same issue is the reason behind 
TestDataNodeVolumeFailureReporting.testHotSwapOutFailedVolumeAndReporting too.

[~hemanthboyina] could you update the patch and see if the tests look better 
after this change?

> Improve the speed of Datanode Block Scan
> ----------------------------------------
>
>                 Key: HDFS-15406
>                 URL: https://issues.apache.org/jira/browse/HDFS-15406
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: hemanthboyina
>            Assignee: hemanthboyina
>            Priority: Major
>         Attachments: HDFS-15406.001.patch
>
>
> In our customer cluster we have approx 10M blocks in one datanode 
> the Datanode to scans all the blocks , it has taken nearly 5mins
> {code:java}
> 2020-06-10 12:17:06,869 | INFO  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | BlockPool BP-1104115233-**.**.**.**-1571300215588 Total blocks: 
> 11149530, missing metadata files:472, missing block files:472, missing blocks 
> in memory:0, mismatched blocks:0 | DirectoryScanner.java:473
> 2020-06-10 12:17:06,869 | WARN  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | Lock held time above threshold: lock identifier: 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl 
> lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: 
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
> org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
> org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
> org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
> org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
>  | InstrumentedLock.java:143 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15406) Improve the speed of Datanode Block Scan

Reply via email to