[jira] [Commented] (HDFS-14126) DataNode DirectoryScanner holding global lock for too long

2019-06-25 Thread tangyupeng (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872953#comment-16872953
 ] 

tangyupeng commented on HDFS-14126:
---

I got the same problem on version 2.8.4-amzn-1 too.  And it seems that datanode 
will restart automatically for some reason after that. 

 
{code:java}
2019-06-26 05:27:21,615 WARN 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl 
(VolumeScannerThread(/mnt/hdfs)): Lock held time above threshold: lock 
identifier: org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl 
lockHeldTimeMs=768 ms. Suppressed 8 lock warnings. The stack trace is: 
java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1033)

... // here is some regular info log

2019-06-26 05:27:36,817 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 
(main): STARTUP_MSG:
/
STARTUP_MSG: Starting DataNode
STARTUP_MSG: user = hdfs
{code}
so, how to avoid this?

> DataNode DirectoryScanner holding global lock for too long
> --
>
> Key: HDFS-14126
> URL: https://issues.apache.org/jira/browse/HDFS-14126
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Priority: Major
>
> I've got a Hadoop 3 based cluster set up, and this DN has just 434 thousand 
> blocks.
> And yet, DirectoryScanner holds the fsdataset lock for 2.7 seconds:
> {quote}
> 2018-12-03 21:33:09,130 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-4588049-10.17.XXX-XX-281857726 Total blocks: 434401, missing metadata 
> fi
> les:0, missing block files:0, missing blocks in memory:0, mismatched blocks:0
> 2018-12-03 21:33:09,131 WARN 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Lock 
> held time above threshold: lock identifier: org.apache.hadoop.hdfs.serve
> r.datanode.fsdataset.impl.FsDatasetImpl lockHeldTimeMs=2710 ms. Suppressed 0 
> lock warnings. The stack trace is: 
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
> org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
> org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
> org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
> org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:473)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:373)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:318)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
> {quote}
> Log messages like this repeats every several hours (6, to be exact). I am not 
> sure if this is a performance regression, or just the fact that the lock 
> information is printed in Hadoop 3. [~vagarychen] or [~templedf] do you know?
> There's no log in DN to indicate any sort of JVM GC going on. Plus, the DN's 
> heap size is set to several GB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14126) DataNode DirectoryScanner holding global lock for too long

2019-07-24 Thread Mingchen_Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16892409#comment-16892409
 ] 

Mingchen_Ma commented on HDFS-14126:


why we use readwritelock instead of lock?

> DataNode DirectoryScanner holding global lock for too long
> --
>
> Key: HDFS-14126
> URL: https://issues.apache.org/jira/browse/HDFS-14126
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Priority: Major
>
> I've got a Hadoop 3 based cluster set up, and this DN has just 434 thousand 
> blocks.
> And yet, DirectoryScanner holds the fsdataset lock for 2.7 seconds:
> {quote}
> 2018-12-03 21:33:09,130 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-4588049-10.17.XXX-XX-281857726 Total blocks: 434401, missing metadata 
> fi
> les:0, missing block files:0, missing blocks in memory:0, mismatched blocks:0
> 2018-12-03 21:33:09,131 WARN 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Lock 
> held time above threshold: lock identifier: org.apache.hadoop.hdfs.serve
> r.datanode.fsdataset.impl.FsDatasetImpl lockHeldTimeMs=2710 ms. Suppressed 0 
> lock warnings. The stack trace is: 
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
> org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
> org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
> org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
> org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:473)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:373)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:318)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
> {quote}
> Log messages like this repeats every several hours (6, to be exact). I am not 
> sure if this is a performance regression, or just the fact that the lock 
> information is printed in Hadoop 3. [~vagarychen] or [~templedf] do you know?
> There's no log in DN to indicate any sort of JVM GC going on. Plus, the DN's 
> heap size is set to several GB.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14126) DataNode DirectoryScanner holding global lock for too long

2018-12-04 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709085#comment-16709085
 ] 

Chen Liang commented on HDFS-14126:
---

Thanks for reporting [~jojochuang]. I have not seen this issue though. I just 
randomly checked several DNs in our 3.1 cluster, with number of blocks from 
330K to 1024K, I did not see this exception.

> DataNode DirectoryScanner holding global lock for too long
> --
>
> Key: HDFS-14126
> URL: https://issues.apache.org/jira/browse/HDFS-14126
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Priority: Major
>
> I've got a Hadoop 3 based cluster set up, and this DN has just 434 thousand 
> blocks.
> And yet, DirectoryScanner holds the fsdataset lock for 2.7 seconds:
> {quote}
> 2018-12-03 21:33:09,130 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-4588049-10.17.XXX-XX-281857726 Total blocks: 434401, missing metadata 
> fi
> les:0, missing block files:0, missing blocks in memory:0, mismatched blocks:0
> 2018-12-03 21:33:09,131 WARN 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Lock 
> held time above threshold: lock identifier: org.apache.hadoop.hdfs.serve
> r.datanode.fsdataset.impl.FsDatasetImpl lockHeldTimeMs=2710 ms. Suppressed 0 
> lock warnings. The stack trace is: 
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
> org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
> org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
> org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
> org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:473)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:373)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:318)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
> {quote}
> Log messages like this repeats every several hours (6, to be exact). I am not 
> sure if this is a performance regression, or just the fact that the lock 
> information is printed in Hadoop 3. [~vagarychen] or [~templedf] do you know?
> There's no log in DN to indicate any sort of JVM GC going on. Plus, the DN's 
> heap size is set to several GB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14126) DataNode DirectoryScanner holding global lock for too long

2018-12-13 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719884#comment-16719884
 ] 

Daniel Templeton commented on HDFS-14126:
-

That error and the lag that causes it are exactly why the directory scanner 
throttle test is flaky.  When working on that test, I noticed that we 
occasionally see that lock-held-too-long message, and it correlates with the 
directory scanner taking much longer than usual to complete a scan.  So, yes 
it's a performance issue, but, no, it's not a regression.

> DataNode DirectoryScanner holding global lock for too long
> --
>
> Key: HDFS-14126
> URL: https://issues.apache.org/jira/browse/HDFS-14126
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Priority: Major
>
> I've got a Hadoop 3 based cluster set up, and this DN has just 434 thousand 
> blocks.
> And yet, DirectoryScanner holds the fsdataset lock for 2.7 seconds:
> {quote}
> 2018-12-03 21:33:09,130 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-4588049-10.17.XXX-XX-281857726 Total blocks: 434401, missing metadata 
> fi
> les:0, missing block files:0, missing blocks in memory:0, mismatched blocks:0
> 2018-12-03 21:33:09,131 WARN 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Lock 
> held time above threshold: lock identifier: org.apache.hadoop.hdfs.serve
> r.datanode.fsdataset.impl.FsDatasetImpl lockHeldTimeMs=2710 ms. Suppressed 0 
> lock warnings. The stack trace is: 
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
> org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
> org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
> org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
> org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:473)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:373)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:318)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
> {quote}
> Log messages like this repeats every several hours (6, to be exact). I am not 
> sure if this is a performance regression, or just the fact that the lock 
> information is printed in Hadoop 3. [~vagarychen] or [~templedf] do you know?
> There's no log in DN to indicate any sort of JVM GC going on. Plus, the DN's 
> heap size is set to several GB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org