Jason Lowe created MAPREDUCE-6303: ------------------------------------- Summary: Read timeout when retrying a fetch error can be fatal to a reducer Key: MAPREDUCE-6303 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6303 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.6.0 Reporter: Jason Lowe Priority: Blocker
If a reducer encounters an error trying to fetch from a node then encounters a read timeout when trying to re-establish the connection then the reducer can fail. The read timeout exception can leak to the top of the Fetcher thread which will cause the reduce task to teardown. This type of error can repeat across reducer attempts causing jobs to fail due to a single bad node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)