[
https://issues.apache.org/jira/browse/HADOOP-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647627#action_12647627
]
Steve Loughran commented on HADOOP-4659:
----------------------------------------
The problem could be - I repeat could be- from HADOOP-2188, though I'm not
sure. There have been too many changes to roll back, and its easier to go
forwards.
I have a patch that (correctly) puts the task tracker back to retrying
[sf-startdaemon-debug] 08/11/14 15:06:43 [TaskTracker] INFO ipc.Client :
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 5 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:43 [Thread-41] INFO datanode.DataNode :
BlockReport of 0 blocks got processed in 1 msecs
[sf-startdaemon-debug] 08/11/14 15:06:44 [TaskTracker] INFO ipc.Client :
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 6 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:45 [TaskTracker] INFO ipc.Client :
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 7 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:46 [TaskTracker] INFO ipc.Client :
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 8 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:47 [TaskTracker] INFO ipc.Client :
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 9 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:47 [TaskTracker] INFO ipc.RPC : Server at
localhost/127.0.0.1:8012 not available yet, Zzzzz...
[sf-startdaemon-debug] 08/11/14 15:06:49 [TaskTracker] INFO ipc.Client :
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 0 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:50 [TaskTracker] INFO ipc.Client :
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 1 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:51 [TaskTracker] INFO ipc.Client :
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 2 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:52 [TaskTracker] INFO ipc.Client :
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 3 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:53 [TaskTracker] INFO ipc.Client :
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 4 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:53 [Thread-41] INFO datanode.DataNode :
BlockReport of 0 blocks got processed in 1 msecs
[sf-startdaemon-debug] 08/11/14 15:06:54 [TaskTracker] INFO ipc.Client :
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 5 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:55 [TaskTracker] INFO ipc.Client :
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 6 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:56 [TaskTracker] INFO ipc.Client :
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 7 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:57 [TaskTracker] INFO ipc.Client :
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 8 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:58 [TaskTracker] INFO ipc.Client :
Retrying connect to server: localhost/127.0.0.1:8012. Already tried 9 time(s).
[sf-startdaemon-debug] 08/11/14 15:06:58 [TaskTracker] INFO ipc.RPC : Server at
localhost/127.0.0.1:8012 not available yet, Zzzzz...
> Root cause of connection failure is being lost to code that uses it for
> delaying startup
> ----------------------------------------------------------------------------------------
>
> Key: HADOOP-4659
> URL: https://issues.apache.org/jira/browse/HADOOP-4659
> Project: Hadoop Core
> Issue Type: Bug
> Components: ipc
> Affects Versions: 0.19.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
>
> ipc.Client the root cause of a connection failure is being lost as the
> exception is wrapped, hence the outside code, the one that looks for that
> root cause, isn't working as expected. The results is you can't bring up a
> task tracker before job tracker, and probably the same for a datanode before
> a namenode. The change that triggered this is not yet located, I had thought
> it was HADOOP-3844 but I no longer believe this is the case.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.