[ https://issues.apache.org/jira/browse/HDFS-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549886#comment-13549886 ]
Chris Nauroth commented on HDFS-4328: ------------------------------------- Thread dumps show the test hanging when {{DataBlockScanner#shutdown}} tries to join with the {{blockScannerThread}}: {noformat} "main" prio=5 tid=7fd86d800800 nid=0x10efc1000 in Object.wait() [10efbe000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1210) - locked <7c3965cd8> (a java.lang.Thread) at java.lang.Thread.join(Thread.java:1263) at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.shutdown(DataBlockScanner.java:251) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownDataBlockScanner(DataNode.java:490) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownPeriodicScanners(DataNode.java:462) at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1104) ... {noformat} Meanwhile in the {{blockScannerThread}}, it's stuck in an infinite wait loop in {{DataTransferThrottler#throttle}}: {noformat} "Thread-60" daemon prio=5 tid=7fd86c1a6800 nid=0x11c378000 in Object.wait() [11c377000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.hadoop.hdfs.util.DataTransferThrottler.throttle(DataTransferThrottler.java:98) - locked <7c3c841a0> (a org.apache.hadoop.hdfs.util.DataTransferThrottler) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:526) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:653) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.verifyBlock(BlockPoolSliceScanner.java:397) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.verifyFirstBlock(BlockPoolSliceScanner.java:476) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:633) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:599) at org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:101) ... {noformat} It's likely that this infinite loop problem existed before the HDFS-4274 patch, but {{blockScannerThread}} was a daemon thread, so it didn't block datanode shutdown. With the HDFS-4274 patch, datanode shutdown now joins to this thread and waits for it to finish, causing it to block datanode shutdown. I need to keep investigating why {{DataTransferThrottler#throttle}} is stuck in an infinite wait loop. > TestLargeBlock#testLargeBlockSize is timing out > ----------------------------------------------- > > Key: HDFS-4328 > URL: https://issues.apache.org/jira/browse/HDFS-4328 > Project: Hadoop HDFS > Issue Type: Bug > Components: test > Affects Versions: 3.0.0 > Reporter: Jason Lowe > > For some time now TestLargeBlock#testLargeBlockSize has been timing out on > trunk. It is getting hung up during cluster shutdown, and after 15 minutes > surefire kills it and causes the build to fail since it exited uncleanly. > In addition to fixing the hang, we should consider adding a timeout parameter > to the @Test decorator for this test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira