[ https://issues.apache.org/jira/browse/HDFS-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136730#comment-17136730 ]
hemanthboyina commented on HDFS-15406: -------------------------------------- thanks [~sodonnell] for the comment {quote} # After you started caching `getBaseURI()` did it improve the runtime of both the getDiskReport() step and compare with in-memory step?{quote} yes , on caching the getBaseURI the time taken is improved in both places , more details below {quote}2. Looking at the code on trunk, I don't think we create any scanInfo objects under the lock in the compare sections unless there is a difference. If this change improved your runtime under the lock from 6m -> 52 seconds, is this because there is a large number of differences between disk and memory on your cluster for some reason? {quote} i think you are referring creation of scan info objects here because for creating ScanInfo object we use vol.getBaseUri() , Even we do not create scaninfo objects we internally call getBaseURI 3 times inside the lock by info.getBlockFile() , info.getGenStamp() , memBlock.compareWith(info) , so if we have large number of differences between disk and in memory calls to getBaseURI will be even more , So by caching getBaseURI we saved atleast 3 times inside lock and once in outside lock , so there was huge decrease in lock time held ,We tried creating 11M blocks in our independent cluster and we could see lock time is held for 3min 20 sec before caching and upon caching the getBaseURI lock time was52sec {quote}3. Did you do capture any profiles (flame chart or debug log messages) to see how long each part of the code under the lock runs for? I am interested in these lines in DirectoryScanner#scan(): {quote} i didn't capture profiles , but i could see dataset.getFinalizedBlocks(bpid) in stacktrace as it has taken more than 600ms on every iteration > Improve the speed of Datanode Block Scan > ---------------------------------------- > > Key: HDFS-15406 > URL: https://issues.apache.org/jira/browse/HDFS-15406 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: hemanthboyina > Assignee: hemanthboyina > Priority: Major > Attachments: HDFS-15406.001.patch > > > In our customer cluster we have approx 10M blocks in one datanode > the Datanode to scans all the blocks , it has taken nearly 5mins > {code:java} > 2020-06-10 12:17:06,869 | INFO | > java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty > queue] | BlockPool BP-1104115233-**.**.**.**-1571300215588 Total blocks: > 11149530, missing metadata files:472, missing block files:472, missing blocks > in memory:0, mismatched blocks:0 | DirectoryScanner.java:473 > 2020-06-10 12:17:06,869 | WARN | > java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty > queue] | Lock held time above threshold: lock identifier: > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl > lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: > java.lang.Thread.getStackTrace(Thread.java:1559) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032) > org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148) > org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186) > org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133) > org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84) > org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96) > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475) > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375) > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320) > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > java.lang.Thread.run(Thread.java:748) > | InstrumentedLock.java:143 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org