[
https://issues.apache.org/jira/browse/HADOOP-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676915#action_12676915
]
Greg Wilkins commented on HADOOP-4744:
--------------------------------------
A couple of comments on the patches I've seen for this.
Firstly, I still don't understand how this is happening, unless there is a race
inside the JVM socket layer. Jetty's code here is
synchronous, so it is not a memory caching issue and in your code it is the one
thread that does the bind and then the
getLocalPort.
But I'll believe you that it is happening. So a couple of comments about the
patch.
It is possible to call open on a connector(listener) before start, so that it
does the bind early. So rather than
starting the whole server, detecting failure and then trying again, I would
make your init code call open directly
on the listener.
You appear to be trying to search for a free port with:
listener.setPort(listener.getLocalPort() + 1);
but getLocalPort() is returning -1, so will always set 0.
But setting 0 as the port is probably want to want to do in the first place.
The semantics of setting 0 is that
the operating system will pick a free port for you, so you probably want to
start with that.
If you still get -1 as the return, then I would suspect a race in the JVM
library. So rather than stopping and relooping,
it might be worthwhile to sleep for a little bit and try getLocalPort again.
Only if it then returns -1 should you
call close on the connector.
So in psuedo code I would write your start method like:
<pre>
start()
{
listener.open();
int port = listener.getLocalPort();
if (port<0)
{
Thread.sleep(100);
while (port<0)
{
if (retries++>limit)
throw new NastyException()
listener.close();
listener.open();
Thread.sleep(100);
port = listener.getLocalPort();
}
}
server.start();
</pre>
Ie, you are just cycling the connector/listener until you get a port, rather
than cycling the
whole server.
> Wrong resolution of hostname and port
> --------------------------------------
>
> Key: HADOOP-4744
> URL: https://issues.apache.org/jira/browse/HADOOP-4744
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.20.0
> Reporter: Amareshwari Sriramadasu
> Assignee: Devaraj Das
> Priority: Blocker
> Fix For: 0.20.0
>
> Attachments: 4744.exception.patch, 4744.patch
>
>
> I noticed the following for one of the hosts in a cluster:
> 1. machines.jsp page resolves the http address as just "http://hostname"
> (which doesn't work). It doesnt put the port number for the host. Even if I
> add the port number manually in the URI, the task tracker page does not come
> up.
> 2. All the tasks(both maps and reduces) which ran on the machine ran
> successfully. But tasklogs cannot be viewed, because port-number is not
> resolved. ( same problem as in (1)).
> 3. The reducers waiting for maps ran on that machine fail with connection
> failed errors saying the hostname is 'null'.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.