Jason Lowe created MAPREDUCE-6303:
-------------------------------------

             Summary: Read timeout when retrying a fetch error can be fatal to 
a reducer
                 Key: MAPREDUCE-6303
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6303
             Project: Hadoop Map/Reduce
          Issue Type: Bug
    Affects Versions: 2.6.0
            Reporter: Jason Lowe
            Priority: Blocker


If a reducer encounters an error trying to fetch from a node then encounters a 
read timeout when trying to re-establish the connection then the reducer can 
fail.  The read timeout exception can leak to the top of the Fetcher thread 
which will cause the reduce task to teardown.  This type of error can repeat 
across reducer attempts causing jobs to fail due to a single bad node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to