[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208982#comment-14208982
 ] 

Junping Du commented on MAPREDUCE-6156:
---------------------------------------

Thanks for review and comment, [~jlowe] and [~sseth]!
bq. it might be more informative to wrap it with a message stating how many 
attempts and how long we waited total to try to connect.
Nice catch, address it in v2 patch.

bq. Arguably we should be computing the socket timeout time just before we try 
to connect rather than just after we received the error.
I think it could be the same in math, even for the last try in unit time, as we 
will sleep for the rest of unit time (like Sid suggested below) and give 
connection another chance. May be no much different here?

bq.  I think we should return if stopped==true when that's received.
Sounds good. Addressed in v2 patch.

bq. he sleep time is computed as (currentTime-lastTime) - which is likely to be 
a tight loop while the NM is down.
Thanks for identifying this, I think we should sleep the rest of unit time 
(unit - (currentTime-lastTime)). Addressed in v2 patch.

> Fetcher - connect() doesn't handle connection refused correctly 
> ----------------------------------------------------------------
>
>                 Key: MAPREDUCE-6156
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6156
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Sidharta Seethana
>            Assignee: Junping Du
>            Priority: Blocker
>         Attachments: MAPREDUCE-6156.patch
>
>
> The connect() function in the fetcher assumes that whenever an IOException is 
> thrown, the amount of time passed equals "connectionTimeout" ( see code 
> snippet below ). This is incorrect. For example, in case the NM is down, an 
> ConnectException is thrown immediately - and the catch block assumes a minute 
> has passed when it is not the case.
> {code}
>   if (connectionTimeout < 0) {
>       throw new IOException("Invalid timeout "
>                             + "[timeout = " + connectionTimeout + " ms]");
>     } else if (connectionTimeout > 0) {
>       unit = Math.min(UNIT_CONNECT_TIMEOUT, connectionTimeout);
>     }
>     // set the connect timeout to the unit-connect-timeout
>     connection.setConnectTimeout(unit);
>     while (true) {
>       try {
>         connection.connect();
>         break;
>       } catch (IOException ioe) {
>         // update the total remaining connect-timeout
>         connectionTimeout -= unit;
>         // throw an exception if we have waited for timeout amount of time
>         // note that the updated value if timeout is used here
>         if (connectionTimeout == 0) {
>           throw ioe;
>         }
>         // reset the connect timeout for the last try
>         if (connectionTimeout < unit) {
>           unit = connectionTimeout;
>           // reset the connect time out for the final connect
>           connection.setConnectTimeout(unit);
>         }
>       }
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to