[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560123#comment-13560123
 ] 

Jason Lowe commented on MAPREDUCE-4951:
---------------------------------------

Note that I'm not sure whether the fix belongs in YARN or left to the AM to 
sort out.  YARN could implement preemption by asking the AM to kill it on the 
scheduler's behalf (so the AM definitely knows why the container is being 
killed since it's the one giving the final order to the NM), or the AM could 
work around the race by waiting for the final container status even though the 
task reported failure.  There are some issues to work out wrt. failure modes, 
e.g. the AM loses connectivity to the NM, etc.
                
> Container preemption interpreted as task failure
> ------------------------------------------------
>
>                 Key: MAPREDUCE-4951
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mr-am, mrv2
>    Affects Versions: 2.0.2-alpha
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>         Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951.patch
>
>
> When YARN reports a completed container to the MR AM, it always interprets it 
> as a failure.  This can lead to a job failing because too many of its tasks 
> failed, when in fact they only failed because the scheduler preempted them.
> MR needs to recognize the special exit code value of -100 and interpret it as 
> a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to