[
https://issues.apache.org/jira/browse/HADOOP-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649087#action_12649087
]
Steve Loughran commented on HADOOP-4659:
----------------------------------------
I'm going to put a merged patch up, but although the RPC test is passing, the
spinning appears to be creating deadlock in TestFileCreationClient; relevant
bits of the thread dump to follow.
1. We're sleeping here holding [EMAIL PROTECTED]
[junit] "DataStreamer for file /wrwelkj/file9 block
blk_-4298389317957709021_1010" id=133 idx=0x210 tid=25976 prio=5 alive, in
native, sleeping, native_waiting, daemon
[junit] at java/lang/Thread.sleep(J)V(Native Method)
[junit] at
org/apache/hadoop/ipc/Client$Connection.handleConnectionFailure(Client.java:373)
[junit] at
org/apache/hadoop/ipc/Client$Connection.setupIOstreams(Client.java:310)
[junit] ^-- Holding lock: org/apache/hadoop/ipc/[EMAIL PROTECTED] lock]
[junit] at
org/apache/hadoop/ipc/Client$Connection.access$1700(Client.java:177)
[junit] at org/apache/hadoop/ipc/Client.getConnection(Client.java:791)
[junit] at org/apache/hadoop/ipc/Client.call(Client.java:697)
[junit] at org/apache/hadoop/ipc/RPC$Invoker.invoke(RPC.java:216)
[junit] at $Proxy7.getProtocolVersion(Ljava/lang/String;J)J(Unknown
Source)
[junit] at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:340)
[junit] at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:327)
[junit] at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:364)
[junit] at org/apache/hadoop/ipc/RPC.waitForProxy(RPC.java:299)
[junit] at org/apache/hadoop/ipc/RPC.waitForProxy(RPC.java:286)
2. Which is blocking this
[junit] -- Blocked trying to get lock: org/apache/hadoop/ipc/[EMAIL
PROTECTED] lock]
[junit] at jrockit/vm/Threads.sleep(I)V(Native Method)
[junit] at
jrockit/vm/Locks.waitForThinRelease(Locks.java:1233)[optimized]
[junit] at
jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1307)[optimized]
[junit] at jrockit/vm/Locks.monitorEnter(Locks.java:2389)[optimized]
[junit] at
org/apache/hadoop/ipc/Client$Connection.addCall(Client.java:219)
[junit] at
org/apache/hadoop/ipc/Client$Connection.access$1600(Client.java:177)
[junit] at org/apache/hadoop/ipc/Client.getConnection(Client.java:785)
[junit] at org/apache/hadoop/ipc/Client.call(Client.java:697)
[junit] at org/apache/hadoop/ipc/RPC$Invoker.invoke(RPC.java:216)
[junit] at $Proxy7.getProtocolVersion(Ljava/lang/String;J)J(Unknown
Source)
[junit] at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:340)
[junit] at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:327)
[junit] at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:364)
[junit] at org/apache/hadoop/ipc/RPC.waitForProxy(RPC.java:299)
[junit] at org/apache/hadoop/ipc/RPC.waitForProxy(RPC.java:286)
[junit] at
org/apache/hadoop/hdfs/DFSClient.createClientDatanodeProtocolProxy(DFSClient.java:141)
[junit] at
org/apache/hadoop/hdfs/DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2469)
[junit] at
org/apache/hadoop/hdfs/DFSClient$DFSOutputStream.access$1700(DFSClient.java:1997)
[junit] at
org/apache/hadoop/hdfs/DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160)
and this
[junit] -- Blocked trying to get lock: org/apache/hadoop/ipc/[EMAIL
PROTECTED] lock]
[junit] at jrockit/vm/Threads.sleep(I)V(Native Method)
[junit] at
jrockit/vm/Locks.waitForThinRelease(Locks.java:1233)[optimized]
[junit] at
jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1307)[optimized]
[junit] at jrockit/vm/Locks.monitorEnter(Locks.java:2389)[optimized]
[junit] at
org/apache/hadoop/ipc/Client$Connection.addCall(Client.java:219)
[junit] at
org/apache/hadoop/ipc/Client$Connection.access$1600(Client.java:177)
[junit] at org/apache/hadoop/ipc/Client.getConnection(Client.java:785)
[junit] at org/apache/hadoop/ipc/Client.call(Client.java:697)
[junit] at org/apache/hadoop/ipc/RPC$Invoker.invoke(RPC.java:216)
[junit] at $Proxy7.getProtocolVersion(Ljava/lang/String;J)J(Unknown
Source)
[junit] at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:340)
[junit] at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:327)
[junit] at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:364)
[junit] at org/apache/hadoop/ipc/RPC.waitForProxy(RPC.java:299)
[junit] at org/apache/hadoop/ipc/RPC.waitForProxy(RPC.java:286)
[junit] at
org/apache/hadoop/hdfs/DFSClient.createClientDatanodeProtocolProxy(DFSClient.java:141)
[junit] at
org/apache/hadoop/hdfs/DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2469)
[junit] at
org/apache/hadoop/hdfs/DFSClient$DFSOutputStream.access$1700(DFSClient.java:1997)
[junit] at
org/apache/hadoop/hdfs/DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160)
[junit] ^-- Holding lock: java/util/[EMAIL PROTECTED] lock]
[junit] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
[junit] -- end of trace
[junit] "DataStreamer for file /wrwelkj/file5 block
blk_7479178383257153500_1010" id=127
and this
idx=0x200 tid=25971 prio=5 alive, in native, blocked, daemon
[junit] -- Blocked trying to get lock: org/apache/hadoop/ipc/[EMAIL
PROTECTED] lock]
[junit] at jrockit/vm/Threads.sleep(I)V(Native Method)
[junit] at
jrockit/vm/Locks.waitForThinRelease(Locks.java:1233)[optimized]
[junit] at
jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1307)[optimized]
[junit] at jrockit/vm/Locks.monitorEnter(Locks.java:2389)[optimized]
[junit] at
org/apache/hadoop/ipc/Client$Connection.addCall(Client.java:219)
[junit] at
org/apache/hadoop/ipc/Client$Connection.access$1600(Client.java:177)
[junit] at org/apache/hadoop/ipc/Client.getConnection(Client.java:785)
[junit] at org/apache/hadoop/ipc/Client.call(Client.java:697)
[junit] at org/apache/hadoop/ipc/RPC$Invoker.invoke(RPC.java:216)
[junit] at $Proxy7.getProtocolVersion(Ljava/lang/String;J)J(Unknown
Source)
[junit] at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:340)
[junit] at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:327)
[junit] at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:364)
[junit] at org/apache/hadoop/ipc/RPC.waitForProxy(RPC.java:299)
[junit] at org/apache/hadoop/ipc/RPC.waitForProxy(RPC.java:286)
[junit] at
org/apache/hadoop/hdfs/DFSClient.createClientDatanodeProtocolProxy(DFSClient.java:141)
[junit] at
org/apache/hadoop/hdfs/DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2469)
[junit] at
org/apache/hadoop/hdfs/DFSClient$DFSOutputStream.access$1700(DFSClient.java:1997)
[junit] at
org/apache/hadoop/hdfs/DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2160)
[junit] ^-- Holding lock: java/util/[EMAIL PROTECTED] lock]
[junit] at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
So: the sleep in setupIOStreams appears to be blocking the other operations.
for some reason, <junit> isn't timing out or killing the process, which implies
this is fairly serious.
> Root cause of connection failure is being lost to code that uses it for
> delaying startup
> ----------------------------------------------------------------------------------------
>
> Key: HADOOP-4659
> URL: https://issues.apache.org/jira/browse/HADOOP-4659
> Project: Hadoop Core
> Issue Type: Bug
> Components: ipc
> Affects Versions: 0.18.3
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Blocker
> Fix For: 0.18.3
>
> Attachments: connectRetry.patch, hadoop-4659.patch, rpcConn.patch
>
>
> ipc.Client the root cause of a connection failure is being lost as the
> exception is wrapped, hence the outside code, the one that looks for that
> root cause, isn't working as expected. The results is you can't bring up a
> task tracker before job tracker, and probably the same for a datanode before
> a namenode. The change that triggered this is not yet located, I had thought
> it was HADOOP-3844 but I no longer believe this is the case.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.