[jira] [Updated] (MAPREDUCE-5616) MR Client-AppMaster RPC max retries on socket timeout is too high.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated MAPREDUCE-5616: - Resolution: Fixed Fix Version/s: 2.3.0 3.0.0 Target Version/s: 3.0.0, 2.3.0 (was: 3.0.0, 2.2.0) Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks for the review, Bikas. I've committed this to trunk and branch-2. > MR Client-AppMaster RPC max retries on socket timeout is too high. > -- > > Key: MAPREDUCE-5616 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5616 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 3.0.0, 2.2.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Fix For: 3.0.0, 2.3.0 > > Attachments: MAPREDUCE-5616.1.patch > > > MAPREDUCE-3811 introduced a separate config key for overriding the max > retries applied to RPC connections from the MapReduce Client to the MapReduce > Application Master. This was done to make failover from the AM to the > MapReduce History Server faster in the event that the AM completes while the > client thinks it's still running. However, the RPC client uses a separate > setting for socket timeouts, and this one is not overridden. The default for > this is 45 retries with a 20-second timeout on each retry. This means that > in environments subject to connection timeout instead of connection refused, > the client waits 15 minutes for failover. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-5616) MR Client-AppMaster RPC max retries on socket timeout is too high.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated MAPREDUCE-5616: - Target Version/s: 2.2.0, 3.0.0 (was: 3.0.0, 2.2.0) Status: Patch Available (was: Open) > MR Client-AppMaster RPC max retries on socket timeout is too high. > -- > > Key: MAPREDUCE-5616 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5616 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 2.2.0, 3.0.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: MAPREDUCE-5616.1.patch > > > MAPREDUCE-3811 introduced a separate config key for overriding the max > retries applied to RPC connections from the MapReduce Client to the MapReduce > Application Master. This was done to make failover from the AM to the > MapReduce History Server faster in the event that the AM completes while the > client thinks it's still running. However, the RPC client uses a separate > setting for socket timeouts, and this one is not overridden. The default for > this is 45 retries with a 20-second timeout on each retry. This means that > in environments subject to connection timeout instead of connection refused, > the client waits 15 minutes for failover. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-5616) MR Client-AppMaster RPC max retries on socket timeout is too high.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated MAPREDUCE-5616: - Attachment: MAPREDUCE-5616.1.patch I'm attaching a patch for supporting override of max retries on socket connection timeouts. I chose a default of 3 retries. > MR Client-AppMaster RPC max retries on socket timeout is too high. > -- > > Key: MAPREDUCE-5616 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5616 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 3.0.0, 2.2.0 >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: MAPREDUCE-5616.1.patch > > > MAPREDUCE-3811 introduced a separate config key for overriding the max > retries applied to RPC connections from the MapReduce Client to the MapReduce > Application Master. This was done to make failover from the AM to the > MapReduce History Server faster in the event that the AM completes while the > client thinks it's still running. However, the RPC client uses a separate > setting for socket timeouts, and this one is not overridden. The default for > this is 45 retries with a 20-second timeout on each retry. This means that > in environments subject to connection timeout instead of connection refused, > the client waits 15 minutes for failover. -- This message was sent by Atlassian JIRA (v6.1#6144)