[ https://issues.apache.org/jira/browse/MAPREDUCE-5616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Nauroth updated MAPREDUCE-5616: ------------------------------------- Target Version/s: 2.2.0, 3.0.0 (was: 3.0.0, 2.2.0) Status: Patch Available (was: Open) > MR Client-AppMaster RPC max retries on socket timeout is too high. > ------------------------------------------------------------------ > > Key: MAPREDUCE-5616 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5616 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client > Affects Versions: 2.2.0, 3.0.0 > Reporter: Chris Nauroth > Assignee: Chris Nauroth > Attachments: MAPREDUCE-5616.1.patch > > > MAPREDUCE-3811 introduced a separate config key for overriding the max > retries applied to RPC connections from the MapReduce Client to the MapReduce > Application Master. This was done to make failover from the AM to the > MapReduce History Server faster in the event that the AM completes while the > client thinks it's still running. However, the RPC client uses a separate > setting for socket timeouts, and this one is not overridden. The default for > this is 45 retries with a 20-second timeout on each retry. This means that > in environments subject to connection timeout instead of connection refused, > the client waits 15 minutes for failover. -- This message was sent by Atlassian JIRA (v6.1#6144)