[ 
https://issues.apache.org/jira/browse/HADOOP-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12586115#action_12586115
 ] 

André Martin commented on HADOOP-3197:
--------------------------------------

Raghu,
right, there is no real circular dependency. However, all threads are waiting 
for the DataStreamer that blocks all other threads since the 
java.net.SocketOutputStream.socketWrite0(Native Method) seems to got stuck for 
no real reason. Other DFS Clients are able to write to the DFS cluster without 
any delays etc. and all datanodes are active & alive.
I finally killed the application (after taking this ThreadDump) since there was 
no progress at all for more than 12 hours. No Timeout or any other exception 
has been thrown.
AFAIK socketWrite is a blocking call. However, even when the reader on the 
other end is slow, there should be at least some progress?!?

> Deadlock in DFCClient
> ---------------------
>
>                 Key: HADOOP-3197
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3197
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.16.1
>            Reporter: André Martin
>
> The DFS Client hangs - attached the thread dump - looks like a dead lock to 
> me...
> {noformat}
> "ResponseProcessor for block blk_-7822837545361798562" prio=10 
> tid=0x00002aab993dcc00 nid=0x5241 waiting for monitor entry 
> [0x000000004365e000..0x000000004365ecc0]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>       at 
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:1771)
>       - waiting to lock <0x00002aaaecf2dd08> (a java.util.LinkedList)
> "DataStreamer for file 
> /seDNS/mapred-out/18A59C65A91D44E5BA24785DF103D1781BB0137E.cache.new block 
> blk_-7822837545361798562" prio=10 tid=0x00002aab96a46000 nid=0x523f runnable 
> [0x000000004345c000..0x000000004345cc40]
>    java.lang.Thread.State: RUNNABLE
>       at java.net.SocketOutputStream.socketWrite0(Native Method)
>       at java.net.SocketOutputStream.socketWrite(Unknown Source)
>       at java.net.SocketOutputStream.write(Unknown Source)
>       at java.io.BufferedOutputStream.write(Unknown Source)
>       - locked <0x00002aaaecf2ec50> (a java.io.BufferedOutputStream)
>       at java.io.DataOutputStream.write(Unknown Source)
>       - locked <0x00002aaaecf2ec20> (a java.io.DataOutputStream)
>       at 
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1623)
>       - locked <0x00002aaaecf2dd08> (a java.util.LinkedList)
> "BackupJobQueuesThread" prio=10 tid=0x00002aab94b94000 nid=0x7cb2 waiting for 
> monitor entry [0x000000004244c000..0x000000004244cd40]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>       at 
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:2117)
>       - waiting to lock <0x00002aaaecf2dd08> (a java.util.LinkedList)
>       at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:141)
>       at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:124)
>       - locked <0x00002aaaecf2e670> (a 
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
>       at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:58)
>       - locked <0x00002aaaecf2e670> (a 
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
>       at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:36)
>       at java.io.DataOutputStream.writeBytes(Unknown Source)
>       at 
> sedns.serializer.file.FileSerializerServer.serializeJobQueuesAndCache(FileSerializerServer.java:723)
>       - locked <0x00002aaab430fec8> (a java.util.Collections$SynchronizedSet)
>       at 
> sedns.pastry.application.ServerApp$BackupJobListThread.run(ServerApp.java:476)
> "[EMAIL PROTECTED]" daemon prio=10 tid=0x00002aab94bc7c00 nid=0x7ca7 waiting 
> on condition [0x0000000041941000..0x0000000041941bc0]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>       at java.lang.Thread.sleep(Native Method)
>       at org.apache.hadoop.dfs.DFSClient$LeaseChecker.run(DFSClient.java:597)
>       at java.lang.Thread.run(Unknown Source)
> {noformat}
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to