[ https://issues.apache.org/jira/browse/MAPREDUCE-6156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208979#comment-14208979 ]
sidharta seethana commented on MAPREDUCE-6156: ---------------------------------------------- Agree with what [~sseth] said. We shouldn't use a tight loop in case of connection failures. It might also be simpler to use something like this : https://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/io/retry/package-summary.html instead of rolling our own retry implementation. > Fetcher - connect() doesn't handle connection refused correctly > ---------------------------------------------------------------- > > Key: MAPREDUCE-6156 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6156 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: sidharta seethana > Assignee: Junping Du > Priority: Blocker > Attachments: MAPREDUCE-6156.patch > > > The connect() function in the fetcher assumes that whenever an IOException is > thrown, the amount of time passed equals "connectionTimeout" ( see code > snippet below ). This is incorrect. For example, in case the NM is down, an > ConnectException is thrown immediately - and the catch block assumes a minute > has passed when it is not the case. > {code} > if (connectionTimeout < 0) { > throw new IOException("Invalid timeout " > + "[timeout = " + connectionTimeout + " ms]"); > } else if (connectionTimeout > 0) { > unit = Math.min(UNIT_CONNECT_TIMEOUT, connectionTimeout); > } > // set the connect timeout to the unit-connect-timeout > connection.setConnectTimeout(unit); > while (true) { > try { > connection.connect(); > break; > } catch (IOException ioe) { > // update the total remaining connect-timeout > connectionTimeout -= unit; > // throw an exception if we have waited for timeout amount of time > // note that the updated value if timeout is used here > if (connectionTimeout == 0) { > throw ioe; > } > // reset the connect timeout for the last try > if (connectionTimeout < unit) { > unit = connectionTimeout; > // reset the connect time out for the final connect > connection.setConnectTimeout(unit); > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)