[ https://issues.apache.org/jira/browse/MAPREDUCE-5616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822129#comment-13822129 ]
Bikas Saha commented on MAPREDUCE-5616: --------------------------------------- Looks like a fairly straightforward change for a fairly non-trivial bug. Thanks Chris! +1. > MR Client-AppMaster RPC max retries on socket timeout is too high. > ------------------------------------------------------------------ > > Key: MAPREDUCE-5616 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5616 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client > Affects Versions: 3.0.0, 2.2.0 > Reporter: Chris Nauroth > Assignee: Chris Nauroth > Attachments: MAPREDUCE-5616.1.patch > > > MAPREDUCE-3811 introduced a separate config key for overriding the max > retries applied to RPC connections from the MapReduce Client to the MapReduce > Application Master. This was done to make failover from the AM to the > MapReduce History Server faster in the event that the AM completes while the > client thinks it's still running. However, the RPC client uses a separate > setting for socket timeouts, and this one is not overridden. The default for > this is 45 retries with a 20-second timeout on each retry. This means that > in environments subject to connection timeout instead of connection refused, > the client waits 15 minutes for failover. -- This message was sent by Atlassian JIRA (v6.1#6144)