[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822707#comment-13822707
 ] 

Hudson commented on MAPREDUCE-5616:
-----------------------------------

SUCCESS: Integrated in Hadoop-trunk-Commit #4739 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4739/])
MAPREDUCE-5616. MR Client-AppMaster RPC max retries on socket timeout is too 
high. Contributed by Chris Nauroth. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1542001)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ClientServiceDelegate.java


> MR Client-AppMaster RPC max retries on socket timeout is too high.
> ------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5616
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5616
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 3.0.0, 2.2.0
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>             Fix For: 3.0.0, 2.3.0
>
>         Attachments: MAPREDUCE-5616.1.patch
>
>
> MAPREDUCE-3811 introduced a separate config key for overriding the max 
> retries applied to RPC connections from the MapReduce Client to the MapReduce 
> Application Master.  This was done to make failover from the AM to the 
> MapReduce History Server faster in the event that the AM completes while the 
> client thinks it's still running.  However, the RPC client uses a separate 
> setting for socket timeouts, and this one is not overridden.  The default for 
> this is 45 retries with a 20-second timeout on each retry.  This means that 
> in environments subject to connection timeout instead of connection refused, 
> the client waits 15 minutes for failover.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to