Renukaprasad C created HDFS-16013:
-------------------------------------

             Summary: DirectoryScan operation holds dataset lock for long time
                 Key: HDFS-16013
                 URL: https://issues.apache.org/jira/browse/HDFS-16013
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: Renukaprasad C


Environment: 3 Node cluster with around 2M files & same number of blocks.

All file operations are normal, only during directory scan, which take more 
memory and some long GC Pause. This directory scan happens for every 6H 
(default value) which cause slow response to any file operations. Delay is 
around 5-8 seconds (In production this delay got increased to 30+ seconds with 
8M blocks)

GC Configuration:
-Xms6144M 
-Xmx12288M /8G
-XX:NewSize=614M 
-XX:MaxNewSize=1228M 
-XX:MetaspaceSize=128M 
-XX:MaxMetaspaceSize=128M 
-XX:CMSFullGCsBeforeCompaction=1 
-XX:MaxDirectMemorySize=1G 
-XX:+UseConcMarkSweepGC 
-XX:+CMSParallelRemarkEnabled 
-XX:+UseCMSCompactAtFullCollection 
-XX:CMSInitiatingOccupancyFraction=80 

Also we tried with G1 GC, but couldnt find much difference in the result.
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:InitiatingHeapOccupancyPercent=45
-XX:G1ReservePercent=10


{code:java}
2021-05-07 16:32:23,508 INFO 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
BP-345634799-<IP>-1619695417333 Total blocks: 2767211, missing metadata files: 
22, missing block files: 22, missing blocks in memory: 0, mismatched blocks: 0
2021-05-07 16:32:23,508 WARN 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Lock held 
time above threshold: lock identifier: FsDatasetRWLock lockHeldTimeMs=7061 ms. 
Suppressed 0 lock warnings. The stack trace is: 
java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
org.apache.hadoop.util.InstrumentedReadLock.unlock(InstrumentedReadLock.java:78)
org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:539)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:416)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:359)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
{code}

We have the following Jiras our code already. But still facing long lock held. 
- https://issues.apache.org/jira/browse/HDFS-15621, 
https://issues.apache.org/jira/browse/HDFS-15150, 
https://issues.apache.org/jira/browse/HDFS-15160, 
https://issues.apache.org/jira/browse/HDFS-13947


cc: [~brahma] [~belugabehr] [~sodonnell] [~ayushsaxena]  [~weichiu] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to