[ 
https://issues.apache.org/jira/browse/HADOOP-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676915#action_12676915
 ] 

Greg Wilkins commented on HADOOP-4744:
--------------------------------------


A couple of comments on the patches I've seen for this.

Firstly, I still don't understand how this is happening, unless there is a race 
inside the JVM socket layer.  Jetty's code here is 
synchronous, so it is not a memory caching issue and in your code it is the one 
thread that does the bind and then the 
getLocalPort.

But I'll believe you that it is happening.  So a couple of comments about the 
patch.

It is possible to call open on a connector(listener) before start, so that it 
does the bind early.  So rather than 
starting the whole server, detecting failure and then trying again, I would 
make your init code call open directly
on the listener.

You appear to be trying to search for a free port with:
        listener.setPort(listener.getLocalPort() + 1);
but  getLocalPort() is returning -1, so will always set 0.

But setting 0 as the port is probably want to want to do in the first place.  
The semantics of setting 0 is that
the operating system will pick a free port for you, so you probably want to 
start with that.

If you still get -1 as the return, then I would suspect a race in the JVM 
library.  So rather than stopping and relooping,
it might be worthwhile to sleep for a little bit and try getLocalPort again.  
Only if it then returns -1 should you 
call close on the connector.

So in psuedo code I would write your start method like:

<pre>
start()
{
      listener.open();
      int port = listener.getLocalPort();
      if (port<0)
      {
           Thread.sleep(100);
           while (port<0)
           {
               if (retries++>limit)
                   throw new NastyException()
               listener.close();
               listener.open();
               Thread.sleep(100);
               port = listener.getLocalPort();
          }
      }

     server.start();
</pre>

Ie, you are just cycling the connector/listener until you get a port, rather 
than cycling the
whole server.








 

> Wrong resolution of hostname and port 
> --------------------------------------
>
>                 Key: HADOOP-4744
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4744
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Devaraj Das
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: 4744.exception.patch, 4744.patch
>
>
> I noticed the following for one of the hosts in a cluster:
> 1. machines.jsp page resolves the http address as just "http://hostname"; 
> (which doesn't work). It doesnt put the port number for the host. Even if I 
> add the port number manually in the URI, the  task tracker page does not come 
> up. 
> 2. All the tasks(both maps and reduces) which ran on the machine ran 
> successfully. But tasklogs cannot be viewed, because port-number is not 
> resolved. ( same problem as in (1)).
> 3. The reducers waiting for maps ran on that machine fail with connection 
> failed errors saying the hostname is 'null'.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to