[ 
https://issues.apache.org/jira/browse/HDFS-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136730#comment-17136730
 ] 

hemanthboyina commented on HDFS-15406:
--------------------------------------

thanks [~sodonnell] for the comment 
{quote} # After you started caching `getBaseURI()` did it improve the runtime 
of both the getDiskReport() step and compare with in-memory step?{quote}
yes , on caching the getBaseURI the time taken is improved in both places , 
more details below
{quote}2. Looking at the code on trunk, I don't think we create any scanInfo 
objects under the lock in the compare sections unless there is a difference. If 
this change improved your runtime under the lock from 6m -> 52 seconds, is this 
because there is a large number of differences between disk and memory on your 
cluster for some reason?
{quote}
i think you are referring creation of scan info objects here because for 
creating ScanInfo object we use vol.getBaseUri() , Even we do not create 
scaninfo objects  we internally  call getBaseURI 3 times inside the lock by  
info.getBlockFile() , info.getGenStamp() , memBlock.compareWith(info) , so if 
we have large number of differences between disk and in memory calls to 
getBaseURI will be even more , So by caching getBaseURI we saved  atleast 3 
times inside lock and once in outside lock  , so there was huge decrease in 
lock time held ,We tried creating 11M blocks in our independent cluster and we 
could see lock time is held for 3min 20 sec before caching and upon caching the 
getBaseURI lock time was52sec
{quote}3. Did you do capture any profiles (flame chart or debug log messages) 
to see how long each part of the code under the lock runs for? I am interested 
in these lines in DirectoryScanner#scan():
{quote}
i didn't capture profiles , but i could see dataset.getFinalizedBlocks(bpid) in 
stacktrace as it has taken more than 600ms on every iteration

 

> Improve the speed of Datanode Block Scan
> ----------------------------------------
>
>                 Key: HDFS-15406
>                 URL: https://issues.apache.org/jira/browse/HDFS-15406
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: hemanthboyina
>            Assignee: hemanthboyina
>            Priority: Major
>         Attachments: HDFS-15406.001.patch
>
>
> In our customer cluster we have approx 10M blocks in one datanode 
> the Datanode to scans all the blocks , it has taken nearly 5mins
> {code:java}
> 2020-06-10 12:17:06,869 | INFO  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | BlockPool BP-1104115233-**.**.**.**-1571300215588 Total blocks: 
> 11149530, missing metadata files:472, missing block files:472, missing blocks 
> in memory:0, mismatched blocks:0 | DirectoryScanner.java:473
> 2020-06-10 12:17:06,869 | WARN  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | Lock held time above threshold: lock identifier: 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl 
> lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: 
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
> org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
> org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
> org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
> org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
>  | InstrumentedLock.java:143 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to