[ 
https://issues.apache.org/jira/browse/HDFS-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549886#comment-13549886
 ] 

Chris Nauroth commented on HDFS-4328:
-------------------------------------

Thread dumps show the test hanging when {{DataBlockScanner#shutdown}} tries to 
join with the {{blockScannerThread}}:

{noformat}
"main" prio=5 tid=7fd86d800800 nid=0x10efc1000 in Object.wait() [10efbe000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Thread.join(Thread.java:1210)
        - locked <7c3965cd8> (a java.lang.Thread)
        at java.lang.Thread.join(Thread.java:1263)
        at 
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.shutdown(DataBlockScanner.java:251)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownDataBlockScanner(DataNode.java:490)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownPeriodicScanners(DataNode.java:462)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1104)
        ...
{noformat}

Meanwhile in the {{blockScannerThread}}, it's stuck in an infinite wait loop in 
{{DataTransferThrottler#throttle}}:

{noformat}
"Thread-60" daemon prio=5 tid=7fd86c1a6800 nid=0x11c378000 in Object.wait() 
[11c377000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at 
org.apache.hadoop.hdfs.util.DataTransferThrottler.throttle(DataTransferThrottler.java:98)
        - locked <7c3c841a0> (a 
org.apache.hadoop.hdfs.util.DataTransferThrottler)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:526)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:653)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.verifyBlock(BlockPoolSliceScanner.java:397)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.verifyFirstBlock(BlockPoolSliceScanner.java:476)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:633)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scanBlockPoolSlice(BlockPoolSliceScanner.java:599)
        at 
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner.run(DataBlockScanner.java:101)
        ...
{noformat}

It's likely that this infinite loop problem existed before the HDFS-4274 patch, 
but {{blockScannerThread}} was a daemon thread, so it didn't block datanode 
shutdown.  With the HDFS-4274 patch, datanode shutdown now joins to this thread 
and waits for it to finish, causing it to block datanode shutdown.

I need to keep investigating why {{DataTransferThrottler#throttle}} is stuck in 
an infinite wait loop.

                
> TestLargeBlock#testLargeBlockSize is timing out
> -----------------------------------------------
>
>                 Key: HDFS-4328
>                 URL: https://issues.apache.org/jira/browse/HDFS-4328
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 3.0.0
>            Reporter: Jason Lowe
>
> For some time now TestLargeBlock#testLargeBlockSize has been timing out on 
> trunk.  It is getting hung up during cluster shutdown, and after 15 minutes 
> surefire kills it and causes the build to fail since it exited uncleanly.
> In addition to fixing the hang, we should consider adding a timeout parameter 
> to the @Test decorator for this test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to