[
https://issues.apache.org/jira/browse/HADOOP-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650567#action_12650567
]
Steve Loughran commented on HADOOP-4659:
----------------------------------------
I'm going to push out my updated lifecycle patches shortly. One test I have
there brings up a tasktracker without the rest of the infrastructure (DFS,
jobtracker); it is now hanging until the test times out, spinning while things
get set up, waiting for a job tracker that never arrives.
[junit] Tue Nov 25 13:50:13 2008
[junit] BEA JRockit(R) R27.4.0-90-89592-1.6.0_02-20070928-1715-linux-x86_64
[junit] "Main Thread" id=1 idx=0x4 tid=4074 prio=5 alive, in native,
sleeping, native_waiting
[junit] at java/lang/Thread.sleep(J)V(Native Method)
[junit] at
org/apache/hadoop/ipc/Client$Connection.handleConnectionFailure(Client.java:364)
[junit] at
org/apache/hadoop/ipc/Client$Connection.setupIOstreams(Client.java:310)
[junit] ^-- Holding lock: org/apache/hadoop/ipc/[EMAIL PROTECTED] lock]
[junit] at
org/apache/hadoop/ipc/Client$Connection.access$1800(Client.java:177)
[junit] at org/apache/hadoop/ipc/Client.getConnection(Client.java:792)
[junit] at org/apache/hadoop/ipc/Client.call(Client.java:688)
[junit] at org/apache/hadoop/ipc/RPC$Invoker.invoke(RPC.java:215)
[junit] at
org/apache/hadoop/mapred/$Proxy0.getProtocolVersion(Ljava/lang/String;J)J(Unknown
Source)
[junit] at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:347)
[junit] at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:334)
[junit] at org/apache/hadoop/ipc/RPC.getProxy(RPC.java:371)
[junit] at org/apache/hadoop/ipc/RPC.waitForProxy(RPC.java:308)
[junit] at org/apache/hadoop/ipc/RPC.waitForProxy(RPC.java:285)
[junit] at
org/apache/hadoop/mapred/TaskTracker.initialize(TaskTracker.java:454)
[junit] ^-- Holding lock: org/apache/hadoop/mapred/[EMAIL PROTECTED]
[junit] at
org/apache/hadoop/mapred/TaskTracker.innerStart(TaskTracker.java:830)
[junit] ^-- Holding lock: org/apache/hadoop/mapred/[EMAIL PROTECTED]
lock]
[junit] at org/apache/hadoop/util/Service.start(Service.java:186)
[junit] at org/apache/hadoop/util/Service.deploy(Service.java:654)
[junit] at
org/apache/hadoop/mapred/TaskTracker.<init>(TaskTracker.java:965)
[junit] at
org/apache/hadoop/mapred/TaskTracker.<init>(TaskTracker.java:948)
What I propose here is to move TaskTracker to have a timeout on its
waitForProxy() operation, so that if the TT comes up before the JT, there's a
bit of leeway, but eventually the TT will conclude that it is an orphan and
that it cannot start up
> Root cause of connection failure is being lost to code that uses it for
> delaying startup
> ----------------------------------------------------------------------------------------
>
> Key: HADOOP-4659
> URL: https://issues.apache.org/jira/browse/HADOOP-4659
> Project: Hadoop Core
> Issue Type: Bug
> Components: ipc
> Affects Versions: 0.18.3
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Blocker
> Fix For: 0.18.3
>
> Attachments: connectRetry.patch, hadoop-4659.patch,
> hadoop-4659.patch, rpcConn.patch, rpcConn1.patch
>
>
> ipc.Client the root cause of a connection failure is being lost as the
> exception is wrapped, hence the outside code, the one that looks for that
> root cause, isn't working as expected. The results is you can't bring up a
> task tracker before job tracker, and probably the same for a datanode before
> a namenode. The change that triggered this is not yet located, I had thought
> it was HADOOP-3844 but I no longer believe this is the case.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.