[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-02-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584174#comment-13584174
 ] 

Hudson commented on MAPREDUCE-4951:
---

Integrated in Hadoop-Yarn-trunk #135 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/135/])
MAPREDUCE-4951. Container preemption interpreted as task failure. 
Contributed by Sandy Ryza. (Revision 1448615)

 Result = SUCCESS
tomwhite : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448615
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java


 Container preemption interpreted as task failure
 

 Key: MAPREDUCE-4951
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am, mrv2
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.0.4-beta

 Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951-2.patch, 
 MAPREDUCE-4951.patch


 When YARN reports a completed container to the MR AM, it always interprets it 
 as a failure.  This can lead to a job failing because too many of its tasks 
 failed, when in fact they only failed because the scheduler preempted them.
 MR needs to recognize the special exit code value of -100 and interpret it as 
 a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-02-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584238#comment-13584238
 ] 

Hudson commented on MAPREDUCE-4951:
---

Integrated in Hadoop-Hdfs-trunk #1324 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1324/])
MAPREDUCE-4951. Container preemption interpreted as task failure. 
Contributed by Sandy Ryza. (Revision 1448615)

 Result = FAILURE
tomwhite : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448615
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java


 Container preemption interpreted as task failure
 

 Key: MAPREDUCE-4951
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am, mrv2
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.0.4-beta

 Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951-2.patch, 
 MAPREDUCE-4951.patch


 When YARN reports a completed container to the MR AM, it always interprets it 
 as a failure.  This can lead to a job failing because too many of its tasks 
 failed, when in fact they only failed because the scheduler preempted them.
 MR needs to recognize the special exit code value of -100 and interpret it as 
 a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-02-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583138#comment-13583138
 ] 

Hudson commented on MAPREDUCE-4951:
---

Integrated in Hadoop-trunk-Commit #3372 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3372/])
MAPREDUCE-4951. Container preemption interpreted as task failure. 
Contributed by Sandy Ryza. (Revision 1448615)

 Result = SUCCESS
tomwhite : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448615
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java


 Container preemption interpreted as task failure
 

 Key: MAPREDUCE-4951
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am, mrv2
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951-2.patch, 
 MAPREDUCE-4951.patch


 When YARN reports a completed container to the MR AM, it always interprets it 
 as a failure.  This can lead to a job failing because too many of its tasks 
 failed, when in fact they only failed because the scheduler preempted them.
 MR needs to recognize the special exit code value of -100 and interpret it as 
 a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-02-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13583177#comment-13583177
 ] 

Hudson commented on MAPREDUCE-4951:
---

Integrated in Hadoop-Mapreduce-trunk #1351 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1351/])
MAPREDUCE-4951. Container preemption interpreted as task failure. 
Contributed by Sandy Ryza. (Revision 1448615)

 Result = FAILURE
tomwhite : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1448615
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java


 Container preemption interpreted as task failure
 

 Key: MAPREDUCE-4951
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am, mrv2
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.0.4-beta

 Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951-2.patch, 
 MAPREDUCE-4951.patch


 When YARN reports a completed container to the MR AM, it always interprets it 
 as a failure.  This can lead to a job failing because too many of its tasks 
 failed, when in fact they only failed because the scheduler preempted them.
 MR needs to recognize the special exit code value of -100 and interpret it as 
 a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-01-25 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562607#comment-13562607
 ] 

Tom White commented on MAPREDUCE-4951:
--

+1 on the latest patch.

 Container preemption interpreted as task failure
 

 Key: MAPREDUCE-4951
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am, mrv2
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951-2.patch, 
 MAPREDUCE-4951.patch


 When YARN reports a completed container to the MR AM, it always interprets it 
 as a failure.  This can lead to a job failing because too many of its tasks 
 failed, when in fact they only failed because the scheduler preempted them.
 MR needs to recognize the special exit code value of -100 and interpret it as 
 a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-01-24 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562142#comment-13562142
 ] 

Jason Lowe commented on MAPREDUCE-4951:
---

Agree that solving MAPREDUCE-4955 is separate, sorry for the extra noise.  I 
just wanted to point out that even with this patch there will still be spurious 
failures if the task notifies the AM before the AM sees the container status 
from the RM.

 Container preemption interpreted as task failure
 

 Key: MAPREDUCE-4951
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am, mrv2
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951-2.patch, 
 MAPREDUCE-4951.patch


 When YARN reports a completed container to the MR AM, it always interprets it 
 as a failure.  This can lead to a job failing because too many of its tasks 
 failed, when in fact they only failed because the scheduler preempted them.
 MR needs to recognize the special exit code value of -100 and interpret it as 
 a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-01-23 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560736#comment-13560736
 ] 

Jason Lowe commented on MAPREDUCE-4951:
---

bq. having the RM ask the AM to kill the container in case of preemption would 
likely not work as the AM cannot be trusted.

Agreed, I was thinking of exactly the alternative you propose where preemption 
has potentially two phases, a please AM, preempt that container you have with 
a watchdog timer to have the RM kill it forcefully if the AM does not comply in 
a reasonable amount of time.  This eliminates the race where the container can 
fail because of the preemption and provides a way for the AM to potentially 
checkpoint the state of the container for faster recovery.  However it does 
mean the meantime latency for container availability would be higher since the 
AM will have a grace period before relinquishing the resources.

 Container preemption interpreted as task failure
 

 Key: MAPREDUCE-4951
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am, mrv2
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951-2.patch, 
 MAPREDUCE-4951.patch


 When YARN reports a completed container to the MR AM, it always interprets it 
 as a failure.  This can lead to a job failing because too many of its tasks 
 failed, when in fact they only failed because the scheduler preempted them.
 MR needs to recognize the special exit code value of -100 and interpret it as 
 a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-01-23 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560903#comment-13560903
 ] 

Bikas Saha commented on MAPREDUCE-4951:
---

We might be digressing from this jira here. But I really dont think the 2-step 
approach is worth its complexity. The main scenario where it makes sense is 
when the task has an ability to checkpoint its work before getting preempted. I 
havent seen this capability outside of basic research prototypes. Its much 
simpler to have the preemption be an RM only action. We do need to fix the 
action and information loop so that AM's can get correct information about the 
infrastructure's actions.

 Container preemption interpreted as task failure
 

 Key: MAPREDUCE-4951
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am, mrv2
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951-2.patch, 
 MAPREDUCE-4951.patch


 When YARN reports a completed container to the MR AM, it always interprets it 
 as a failure.  This can lead to a job failing because too many of its tasks 
 failed, when in fact they only failed because the scheduler preempted them.
 MR needs to recognize the special exit code value of -100 and interpret it as 
 a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-01-23 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561449#comment-13561449
 ] 

Sandy Ryza commented on MAPREDUCE-4951:
---

It doesn't seem to me that either approach would conflict with this patch at 
the moment.  While this code might get rewritten in the future, under the 
current preemption mechanism, when MR is explicitly told that a container was 
preempted, it should not count it as failed.  Does anybody disagree?

 Container preemption interpreted as task failure
 

 Key: MAPREDUCE-4951
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am, mrv2
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951-2.patch, 
 MAPREDUCE-4951.patch


 When YARN reports a completed container to the MR AM, it always interprets it 
 as a failure.  This can lead to a job failing because too many of its tasks 
 failed, when in fact they only failed because the scheduler preempted them.
 MR needs to recognize the special exit code value of -100 and interpret it as 
 a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-01-22 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559593#comment-13559593
 ] 

Tom White commented on MAPREDUCE-4951:
--

The change looks good to me, but shouldn't the other exit codes be covered too 
or are they already being treated as task killed? The ones mentioned above plus 
-1000 (INVALID_CONTAINER_EXIT_STATUS), -101 (DISKS_FAILED).

Also looks like you added testTaskPreemption without any code.

 Container preemption interpreted as task failure
 

 Key: MAPREDUCE-4951
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am, mrv2
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951.patch


 When YARN reports a completed container to the MR AM, it always interprets it 
 as a failure.  This can lead to a job failing because too many of its tasks 
 failed, when in fact they only failed because the scheduler preempted them.
 MR needs to recognize the special exit code value of -100 and interpret it as 
 a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-01-22 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559849#comment-13559849
 ] 

Bikas Saha commented on MAPREDUCE-4951:
---

From what I see in the code. Both killing due to exceeding memory and killing 
under command from RM (preemption) eventually end up sending a 
ContainerKillEvent that only differentiates between the two using a String 
diagnostic message. That event ends up causing a signal to be sent to the 
actual running container. Based on that, I am not very sure that the exit 
codes are being explicitly used by the NM to differentiate between RM killings 
or memory killings etc.

 Container preemption interpreted as task failure
 

 Key: MAPREDUCE-4951
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am, mrv2
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951.patch


 When YARN reports a completed container to the MR AM, it always interprets it 
 as a failure.  This can lead to a job failing because too many of its tasks 
 failed, when in fact they only failed because the scheduler preempted them.
 MR needs to recognize the special exit code value of -100 and interpret it as 
 a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-01-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559978#comment-13559978
 ] 

Jason Lowe commented on MAPREDUCE-4951:
---

Like the comment states in FairScheduler.preemptResources, I too am unsure if 
the preemption is translated into a kill command to the NM by the RM directly 
or if the scheduler is relying on the AM to see the finished container status 
from the RM and issue the kill to the AM.

If it's the latter, then the container will be killed after the AM has already 
determined the container status correctly.  If the RM really is cleaning up the 
container and turning that into a kill command for the NM, then we've got 
problems.  The task itself could fail as the JVM tears down from a kill command 
and report that failure to the AM via the task umbilical *before* the AM 
discovers via the heartbeat to the RM that the container was preempted.  A 
similar race occurs now when an NM kills a container for being over limits, see 
MAPREDUCE-4955.

 Container preemption interpreted as task failure
 

 Key: MAPREDUCE-4951
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am, mrv2
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951.patch


 When YARN reports a completed container to the MR AM, it always interprets it 
 as a failure.  This can lead to a job failing because too many of its tasks 
 failed, when in fact they only failed because the scheduler preempted them.
 MR needs to recognize the special exit code value of -100 and interpret it as 
 a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-01-22 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560031#comment-13560031
 ] 

Bikas Saha commented on MAPREDUCE-4951:
---

I think its RM killing the container via the NM. The RM kill command ends up 
sending a containers clean list in the NM heartbeat. NM kills containers in 
that list by sending a container_kill event to the container.

 Container preemption interpreted as task failure
 

 Key: MAPREDUCE-4951
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am, mrv2
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951.patch


 When YARN reports a completed container to the MR AM, it always interprets it 
 as a failure.  This can lead to a job failing because too many of its tasks 
 failed, when in fact they only failed because the scheduler preempted them.
 MR needs to recognize the special exit code value of -100 and interpret it as 
 a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-01-22 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560090#comment-13560090
 ] 

Sandy Ryza commented on MAPREDUCE-4951:
---

Tom,
Regarding the other special exit codes, my opinion is that they don't merit the 
same treatment. In general, and if I understand correctly how things worked in 
MR1, failed tasks should be considered guilty until proven innocent, with 
innocent meaning killed explicitly by the RM, and guilty meaning anything else.

Bikas,
That's correct that a ContainerKillEvent is issued in both cases.  However, if 
I understand correctly, when a container is explicitly killed by the RM, the 
special value of -100 is reported to the AM instead of any exit code reported 
by the NM.  You can look for references to 
YarnConfiguration.ABORTED_CONTAINER_EXIT_STATUS to see when/how this works.


 Container preemption interpreted as task failure
 

 Key: MAPREDUCE-4951
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am, mrv2
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951.patch


 When YARN reports a completed container to the MR AM, it always interprets it 
 as a failure.  This can lead to a job failing because too many of its tasks 
 failed, when in fact they only failed because the scheduler preempted them.
 MR needs to recognize the special exit code value of -100 and interpret it as 
 a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-01-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560107#comment-13560107
 ] 

Jason Lowe commented on MAPREDUCE-4951:
---

bq. That's correct that a ContainerKillEvent is issued in both cases. However, 
if I understand correctly, when a container is explicitly killed by the RM, the 
special value of -100 is reported to the AM instead of any exit code reported 
by the NM.

If the RM is indeed telling the NM to kill the container then we would have a 
race with tasks failing due to the kill-shutdown notifying the AM before the AM 
sees the container status from the RM.


 Container preemption interpreted as task failure
 

 Key: MAPREDUCE-4951
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am, mrv2
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951.patch


 When YARN reports a completed container to the MR AM, it always interprets it 
 as a failure.  This can lead to a job failing because too many of its tasks 
 failed, when in fact they only failed because the scheduler preempted them.
 MR needs to recognize the special exit code value of -100 and interpret it as 
 a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-01-22 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560111#comment-13560111
 ] 

Sandy Ryza commented on MAPREDUCE-4951:
---

bq. If the RM is indeed telling the NM to kill the container then we would have 
a race with tasks failing due to the kill-shutdown notifying the AM before the 
AM sees the container status from the RM.

Oh I didn't realize that.  Should I file a YARN JIRA for that?  Or is it 
something that MR should be handling?

 Container preemption interpreted as task failure
 

 Key: MAPREDUCE-4951
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am, mrv2
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951.patch


 When YARN reports a completed container to the MR AM, it always interprets it 
 as a failure.  This can lead to a job failing because too many of its tasks 
 failed, when in fact they only failed because the scheduler preempted them.
 MR needs to recognize the special exit code value of -100 and interpret it as 
 a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-01-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560116#comment-13560116
 ] 

Jason Lowe commented on MAPREDUCE-4951:
---

Arguably it's yet another instance of the race already covered by 
MAPREDUCE-4955 as I mentioned above.

 Container preemption interpreted as task failure
 

 Key: MAPREDUCE-4951
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am, mrv2
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951.patch


 When YARN reports a completed container to the MR AM, it always interprets it 
 as a failure.  This can lead to a job failing because too many of its tasks 
 failed, when in fact they only failed because the scheduler preempted them.
 MR needs to recognize the special exit code value of -100 and interpret it as 
 a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-01-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560123#comment-13560123
 ] 

Jason Lowe commented on MAPREDUCE-4951:
---

Note that I'm not sure whether the fix belongs in YARN or left to the AM to 
sort out.  YARN could implement preemption by asking the AM to kill it on the 
scheduler's behalf (so the AM definitely knows why the container is being 
killed since it's the one giving the final order to the NM), or the AM could 
work around the race by waiting for the final container status even though the 
task reported failure.  There are some issues to work out wrt. failure modes, 
e.g. the AM loses connectivity to the NM, etc.

 Container preemption interpreted as task failure
 

 Key: MAPREDUCE-4951
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am, mrv2
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951.patch


 When YARN reports a completed container to the MR AM, it always interprets it 
 as a failure.  This can lead to a job failing because too many of its tasks 
 failed, when in fact they only failed because the scheduler preempted them.
 MR needs to recognize the special exit code value of -100 and interpret it as 
 a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-01-22 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560174#comment-13560174
 ] 

Hitesh Shah commented on MAPREDUCE-4951:


@Jason, having the RM ask the AM to kill the container in case of preemption 
would likely not work as the AM cannot be trusted. Obviously, there could be a 
different approach where the RM informs the AM that a particular container will 
be preempted soon but the RM eventually would need to trigger a kill for that 
container after a certain delay if it is still up.


 

 Container preemption interpreted as task failure
 

 Key: MAPREDUCE-4951
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am, mrv2
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951.patch


 When YARN reports a completed container to the MR AM, it always interprets it 
 as a failure.  This can lead to a job failing because too many of its tasks 
 failed, when in fact they only failed because the scheduler preempted them.
 MR needs to recognize the special exit code value of -100 and interpret it as 
 a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-01-22 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560265#comment-13560265
 ] 

Sandy Ryza commented on MAPREDUCE-4951:
---

Uploaded a patch that removes the vestigial testTaskPreemption.

 Container preemption interpreted as task failure
 

 Key: MAPREDUCE-4951
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am, mrv2
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951-2.patch, 
 MAPREDUCE-4951.patch


 When YARN reports a completed container to the MR AM, it always interprets it 
 as a failure.  This can lead to a job failing because too many of its tasks 
 failed, when in fact they only failed because the scheduler preempted them.
 MR needs to recognize the special exit code value of -100 and interpret it as 
 a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-01-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560281#comment-13560281
 ] 

Hadoop QA commented on MAPREDUCE-4951:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12566059/MAPREDUCE-4951-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3264//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3264//console

This message is automatically generated.

 Container preemption interpreted as task failure
 

 Key: MAPREDUCE-4951
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am, mrv2
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951-2.patch, 
 MAPREDUCE-4951.patch


 When YARN reports a completed container to the MR AM, it always interprets it 
 as a failure.  This can lead to a job failing because too many of its tasks 
 failed, when in fact they only failed because the scheduler preempted them.
 MR needs to recognize the special exit code value of -100 and interpret it as 
 a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-01-21 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558987#comment-13558987
 ] 

Bikas Saha commented on MAPREDUCE-4951:
---

Will that differentiate between preemption killing and resource (eg out of 
memory) killing?

 Container preemption interpreted as task failure
 

 Key: MAPREDUCE-4951
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am, mrv2
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4951.patch


 When YARN reports a completed container to the MR AM, it always interprets it 
 as a failure.  This can lead to a job failing because too many of its tasks 
 failed, when in fact they only failed because the scheduler preempted them.
 MR needs to recognize the special exit code value of -100 and interpret it as 
 a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-01-21 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559023#comment-13559023
 ] 

Sandy Ryza commented on MAPREDUCE-4951:
---

I believe in that case the exit code will be FORCE_KILLED(137) or 
TERMINATED(143) (from ContainerExecutor.java).

 Container preemption interpreted as task failure
 

 Key: MAPREDUCE-4951
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am, mrv2
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4951.patch


 When YARN reports a completed container to the MR AM, it always interprets it 
 as a failure.  This can lead to a job failing because too many of its tasks 
 failed, when in fact they only failed because the scheduler preempted them.
 MR needs to recognize the special exit code value of -100 and interpret it as 
 a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-01-21 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559160#comment-13559160
 ] 

Sandy Ryza commented on MAPREDUCE-4951:
---

New patch includes test and uses constant from YarnConfiguration instead of 
hardcoded -100.

 Container preemption interpreted as task failure
 

 Key: MAPREDUCE-4951
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am, mrv2
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951.patch


 When YARN reports a completed container to the MR AM, it always interprets it 
 as a failure.  This can lead to a job failing because too many of its tasks 
 failed, when in fact they only failed because the scheduler preempted them.
 MR needs to recognize the special exit code value of -100 and interpret it as 
 a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-01-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559170#comment-13559170
 ] 

Hadoop QA commented on MAPREDUCE-4951:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12565866/MAPREDUCE-4951-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3260//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3260//console

This message is automatically generated.

 Container preemption interpreted as task failure
 

 Key: MAPREDUCE-4951
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am, mrv2
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951.patch


 When YARN reports a completed container to the MR AM, it always interprets it 
 as a failure.  This can lead to a job failing because too many of its tasks 
 failed, when in fact they only failed because the scheduler preempted them.
 MR needs to recognize the special exit code value of -100 and interpret it as 
 a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-01-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558506#comment-13558506
 ] 

Hadoop QA commented on MAPREDUCE-4951:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12565727/MAPREDUCE-4951.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3256//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3256//console

This message is automatically generated.

 Container preemption interpreted as task failure
 

 Key: MAPREDUCE-4951
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mr-am, mrv2
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4951.patch


 When YARN reports a completed container to the MR AM, it always interprets it 
 as a failure.  This can lead to a job failing because too many of its tasks 
 failed, when in fact they only failed because the scheduler preempted them.
 MR needs to recognize the special exit code value of -100 and interpret it as 
 a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira