[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719142#comment-13719142
 ] 

Jason Lowe commented on MAPREDUCE-5251:
---------------------------------------

THanks Ashwin.  I think the patch is almost there, but I noticed that when we 
log an error we don't say anything at all about the error itself -- we probably 
should at least log the .getMessage() of the error if we're going to bother 
logging there was an error.  Also the handling of the unknown host error text 
is somewhat misleading -- one could interpret the "unknown" referring to the 
local error that occurred rather than the fact it couldn't lookup the node name.
                
> Reducer should not implicate map attempt if it has insufficient space to 
> fetch map output
> -----------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5251
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5251
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.7, 2.0.4-alpha
>            Reporter: Jason Lowe
>            Assignee: Ashwin Shankar
>         Attachments: MAPREDUCE-5251-2.txt, MAPREDUCE-5251-3.txt, 
> MAPREDUCE-5251-4.txt, MAPREDUCE-5251-5.txt, MAPREDUCE-5251-6.txt
>
>
> A job can fail if a reducer happens to run on a node with insufficient space 
> to hold a map attempt's output.  The reducer keeps reporting the map attempt 
> as bad, and if the map attempt ends up being re-launched too many times 
> before the reducer decides maybe it is the real problem the job can fail.
> In that scenario it would be better to re-launch the reduce attempt and 
> hopefully it will run on another node that has sufficient space to complete 
> the shuffle.  Reporting the map attempt is bad and relaunching the map task 
> doesn't change the fact that the reducer can't hold the output.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to