[jira] [Commented] (MAPREDUCE-4833) Task can get stuck in FAIL_CONTAINER_CLEANUP

2012-12-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538745#comment-13538745
 ] 

Hudson commented on MAPREDUCE-4833:
---

Integrated in Hadoop-Yarn-trunk #73 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/73/])
MAPREDUCE-4833. Task can get stuck in FAIL_CONTAINER_CLEANUP. Contributed 
by Robert Parker (Revision 1425167)

 Result = SUCCESS
jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1425167
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/launcher/TestContainerLauncherImpl.java


 Task can get stuck in FAIL_CONTAINER_CLEANUP
 

 Key: MAPREDUCE-4833
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4833
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.5
Reporter: Robert Joseph Evans
Assignee: Robert Parker
Priority: Critical
 Fix For: 2.0.3-alpha, 0.23.6

 Attachments: MAPREDUCE4833-1.patch, MAPREDUCE4833-2.patch, 
 MAPREDUCE4833.patch


 If an NM goes down and the AM still tries to launch a container on it the 
 ContainerLauncherImpl can get stuck in an RPC timeout.  At the same time the 
 RM may notice that the NM has gone away and inform the AM of this, this 
 triggers a TA_FAILMSG.  If the TA_FAILMSG arrives at the TaskAttemptImpl 
 before the TA_CONTAINER_LAUNCH_FAILED message then the task attempt will try 
 to kill the container, but the ContainerLauncherImpl will not send back a 
 TA_CONTAINER_CLEANED event causing the attempt to be stuck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4833) Task can get stuck in FAIL_CONTAINER_CLEANUP

2012-12-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538774#comment-13538774
 ] 

Hudson commented on MAPREDUCE-4833:
---

Integrated in Hadoop-Hdfs-0.23-Build #471 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/471/])
svn merge -c 1425167 FIXES: MAPREDUCE-4833. Task can get stuck in 
FAIL_CONTAINER_CLEANUP. Contributed by Robert Parker (Revision 1425169)

 Result = UNSTABLE
jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1425169
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/launcher/TestContainerLauncherImpl.java


 Task can get stuck in FAIL_CONTAINER_CLEANUP
 

 Key: MAPREDUCE-4833
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4833
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.5
Reporter: Robert Joseph Evans
Assignee: Robert Parker
Priority: Critical
 Fix For: 2.0.3-alpha, 0.23.6

 Attachments: MAPREDUCE4833-1.patch, MAPREDUCE4833-2.patch, 
 MAPREDUCE4833.patch


 If an NM goes down and the AM still tries to launch a container on it the 
 ContainerLauncherImpl can get stuck in an RPC timeout.  At the same time the 
 RM may notice that the NM has gone away and inform the AM of this, this 
 triggers a TA_FAILMSG.  If the TA_FAILMSG arrives at the TaskAttemptImpl 
 before the TA_CONTAINER_LAUNCH_FAILED message then the task attempt will try 
 to kill the container, but the ContainerLauncherImpl will not send back a 
 TA_CONTAINER_CLEANED event causing the attempt to be stuck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4833) Task can get stuck in FAIL_CONTAINER_CLEANUP

2012-12-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538789#comment-13538789
 ] 

Hudson commented on MAPREDUCE-4833:
---

Integrated in Hadoop-Hdfs-trunk #1262 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1262/])
MAPREDUCE-4833. Task can get stuck in FAIL_CONTAINER_CLEANUP. Contributed 
by Robert Parker (Revision 1425167)

 Result = FAILURE
jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1425167
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/launcher/TestContainerLauncherImpl.java


 Task can get stuck in FAIL_CONTAINER_CLEANUP
 

 Key: MAPREDUCE-4833
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4833
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.5
Reporter: Robert Joseph Evans
Assignee: Robert Parker
Priority: Critical
 Fix For: 2.0.3-alpha, 0.23.6

 Attachments: MAPREDUCE4833-1.patch, MAPREDUCE4833-2.patch, 
 MAPREDUCE4833.patch


 If an NM goes down and the AM still tries to launch a container on it the 
 ContainerLauncherImpl can get stuck in an RPC timeout.  At the same time the 
 RM may notice that the NM has gone away and inform the AM of this, this 
 triggers a TA_FAILMSG.  If the TA_FAILMSG arrives at the TaskAttemptImpl 
 before the TA_CONTAINER_LAUNCH_FAILED message then the task attempt will try 
 to kill the container, but the ContainerLauncherImpl will not send back a 
 TA_CONTAINER_CLEANED event causing the attempt to be stuck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4833) Task can get stuck in FAIL_CONTAINER_CLEANUP

2012-12-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538803#comment-13538803
 ] 

Hudson commented on MAPREDUCE-4833:
---

Integrated in Hadoop-Mapreduce-trunk #1292 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1292/])
MAPREDUCE-4833. Task can get stuck in FAIL_CONTAINER_CLEANUP. Contributed 
by Robert Parker (Revision 1425167)

 Result = FAILURE
jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1425167
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/launcher/TestContainerLauncherImpl.java


 Task can get stuck in FAIL_CONTAINER_CLEANUP
 

 Key: MAPREDUCE-4833
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4833
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.5
Reporter: Robert Joseph Evans
Assignee: Robert Parker
Priority: Critical
 Fix For: 2.0.3-alpha, 0.23.6

 Attachments: MAPREDUCE4833-1.patch, MAPREDUCE4833-2.patch, 
 MAPREDUCE4833.patch


 If an NM goes down and the AM still tries to launch a container on it the 
 ContainerLauncherImpl can get stuck in an RPC timeout.  At the same time the 
 RM may notice that the NM has gone away and inform the AM of this, this 
 triggers a TA_FAILMSG.  If the TA_FAILMSG arrives at the TaskAttemptImpl 
 before the TA_CONTAINER_LAUNCH_FAILED message then the task attempt will try 
 to kill the container, but the ContainerLauncherImpl will not send back a 
 TA_CONTAINER_CLEANED event causing the attempt to be stuck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4833) Task can get stuck in FAIL_CONTAINER_CLEANUP

2012-12-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538339#comment-13538339
 ] 

Hadoop QA commented on MAPREDUCE-4833:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12562116/MAPREDUCE4833.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 2016 javac 
compiler warnings (more than the trunk's current 2015 warnings).

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3158//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3158//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3158//console

This message is automatically generated.

 Task can get stuck in FAIL_CONTAINER_CLEANUP
 

 Key: MAPREDUCE-4833
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4833
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.5
Reporter: Robert Joseph Evans
Assignee: Robert Parker
Priority: Critical
 Attachments: MAPREDUCE4833-23.patch, MAPREDUCE4833.patch


 If an NM goes down and the AM still tries to launch a container on it the 
 ContainerLauncherImpl can get stuck in an RPC timeout.  At the same time the 
 RM may notice that the NM has gone away and inform the AM of this, this 
 triggers a TA_FAILMSG.  If the TA_FAILMSG arrives at the TaskAttemptImpl 
 before the TA_CONTAINER_LAUNCH_FAILED message then the task attempt will try 
 to kill the container, but the ContainerLauncherImpl will not send back a 
 TA_CONTAINER_CLEANED event causing the attempt to be stuck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4833) Task can get stuck in FAIL_CONTAINER_CLEANUP

2012-12-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538405#comment-13538405
 ] 

Hadoop QA commented on MAPREDUCE-4833:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12562133/MAPREDUCE4833-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 2016 javac 
compiler warnings (more than the trunk's current 2015 warnings).

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3159//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3159//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3159//console

This message is automatically generated.

 Task can get stuck in FAIL_CONTAINER_CLEANUP
 

 Key: MAPREDUCE-4833
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4833
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.5
Reporter: Robert Joseph Evans
Assignee: Robert Parker
Priority: Critical
 Attachments: MAPREDUCE4833-1.patch, MAPREDUCE4833.patch


 If an NM goes down and the AM still tries to launch a container on it the 
 ContainerLauncherImpl can get stuck in an RPC timeout.  At the same time the 
 RM may notice that the NM has gone away and inform the AM of this, this 
 triggers a TA_FAILMSG.  If the TA_FAILMSG arrives at the TaskAttemptImpl 
 before the TA_CONTAINER_LAUNCH_FAILED message then the task attempt will try 
 to kill the container, but the ContainerLauncherImpl will not send back a 
 TA_CONTAINER_CLEANED event causing the attempt to be stuck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4833) Task can get stuck in FAIL_CONTAINER_CLEANUP

2012-12-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538440#comment-13538440
 ] 

Hadoop QA commented on MAPREDUCE-4833:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12562139/MAPREDUCE4833-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3161//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3161//console

This message is automatically generated.

 Task can get stuck in FAIL_CONTAINER_CLEANUP
 

 Key: MAPREDUCE-4833
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4833
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.5
Reporter: Robert Joseph Evans
Assignee: Robert Parker
Priority: Critical
 Attachments: MAPREDUCE4833-1.patch, MAPREDUCE4833-2.patch, 
 MAPREDUCE4833.patch


 If an NM goes down and the AM still tries to launch a container on it the 
 ContainerLauncherImpl can get stuck in an RPC timeout.  At the same time the 
 RM may notice that the NM has gone away and inform the AM of this, this 
 triggers a TA_FAILMSG.  If the TA_FAILMSG arrives at the TaskAttemptImpl 
 before the TA_CONTAINER_LAUNCH_FAILED message then the task attempt will try 
 to kill the container, but the ContainerLauncherImpl will not send back a 
 TA_CONTAINER_CLEANED event causing the attempt to be stuck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4833) Task can get stuck in FAIL_CONTAINER_CLEANUP

2012-12-21 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538472#comment-13538472
 ] 

Jason Lowe commented on MAPREDUCE-4833:
---

+1, thanks for writing a test.

 Task can get stuck in FAIL_CONTAINER_CLEANUP
 

 Key: MAPREDUCE-4833
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4833
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.5
Reporter: Robert Joseph Evans
Assignee: Robert Parker
Priority: Critical
 Attachments: MAPREDUCE4833-1.patch, MAPREDUCE4833-2.patch, 
 MAPREDUCE4833.patch


 If an NM goes down and the AM still tries to launch a container on it the 
 ContainerLauncherImpl can get stuck in an RPC timeout.  At the same time the 
 RM may notice that the NM has gone away and inform the AM of this, this 
 triggers a TA_FAILMSG.  If the TA_FAILMSG arrives at the TaskAttemptImpl 
 before the TA_CONTAINER_LAUNCH_FAILED message then the task attempt will try 
 to kill the container, but the ContainerLauncherImpl will not send back a 
 TA_CONTAINER_CLEANED event causing the attempt to be stuck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4833) Task can get stuck in FAIL_CONTAINER_CLEANUP

2012-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13538489#comment-13538489
 ] 

Hudson commented on MAPREDUCE-4833:
---

Integrated in Hadoop-trunk-Commit #3151 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3151/])
MAPREDUCE-4833. Task can get stuck in FAIL_CONTAINER_CLEANUP. Contributed 
by Robert Parker (Revision 1425167)

 Result = SUCCESS
jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1425167
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/launcher/TestContainerLauncherImpl.java


 Task can get stuck in FAIL_CONTAINER_CLEANUP
 

 Key: MAPREDUCE-4833
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4833
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.5
Reporter: Robert Joseph Evans
Assignee: Robert Parker
Priority: Critical
 Fix For: 2.0.3-alpha, 0.23.6

 Attachments: MAPREDUCE4833-1.patch, MAPREDUCE4833-2.patch, 
 MAPREDUCE4833.patch


 If an NM goes down and the AM still tries to launch a container on it the 
 ContainerLauncherImpl can get stuck in an RPC timeout.  At the same time the 
 RM may notice that the NM has gone away and inform the AM of this, this 
 triggers a TA_FAILMSG.  If the TA_FAILMSG arrives at the TaskAttemptImpl 
 before the TA_CONTAINER_LAUNCH_FAILED message then the task attempt will try 
 to kill the container, but the ContainerLauncherImpl will not send back a 
 TA_CONTAINER_CLEANED event causing the attempt to be stuck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4833) Task can get stuck in FAIL_CONTAINER_CLEANUP

2012-12-19 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13536431#comment-13536431
 ] 

Jason Lowe commented on MAPREDUCE-4833:
---

Patch approach looks good, but we should add a unit test in 
TestContainerLauncherImpl.  Should be fairly straightforward to mock up a 
container launch failing then verifying a CONTAINER_REMOTE_CLEANUP event causes 
a TA_CONTAINER_CLEANED event to be dispatched back even though the container 
has already failed.

 Task can get stuck in FAIL_CONTAINER_CLEANUP
 

 Key: MAPREDUCE-4833
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4833
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.5
Reporter: Robert Joseph Evans
Assignee: Robert Parker
Priority: Critical
 Attachments: MAPREDUCE4833-23.patch


 If an NM goes down and the AM still tries to launch a container on it the 
 ContainerLauncherImpl can get stuck in an RPC timeout.  At the same time the 
 RM may notice that the NM has gone away and inform the AM of this, this 
 triggers a TA_FAILMSG.  If the TA_FAILMSG arrives at the TaskAttemptImpl 
 before the TA_CONTAINER_LAUNCH_FAILED message then the task attempt will try 
 to kill the container, but the ContainerLauncherImpl will not send back a 
 TA_CONTAINER_CLEANED event causing the attempt to be stuck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4833) Task can get stuck in FAIL_CONTAINER_CLEANUP

2012-12-18 Thread Robert Parker (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535288#comment-13535288
 ] 

Robert Parker commented on MAPREDUCE-4833:
--

Previously the Container did not send an event on kill if it was DONE, and 
returned (essentially a no-op). This patch will send a TA_CONTAINER_CLEANED 
event in all cases.

 Task can get stuck in FAIL_CONTAINER_CLEANUP
 

 Key: MAPREDUCE-4833
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4833
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.5
Reporter: Robert Joseph Evans
Assignee: Robert Parker
Priority: Critical
 Attachments: MAPREDUCE4833-23.patch


 If an NM goes down and the AM still tries to launch a container on it the 
 ContainerLauncherImpl can get stuck in an RPC timeout.  At the same time the 
 RM may notice that the NM has gone away and inform the AM of this, this 
 triggers a TA_FAILMSG.  If the TA_FAILMSG arrives at the TaskAttemptImpl 
 before the TA_CONTAINER_LAUNCH_FAILED message then the task attempt will try 
 to kill the container, but the ContainerLauncherImpl will not send back a 
 TA_CONTAINER_CLEANED event causing the attempt to be stuck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4833) Task can get stuck in FAIL_CONTAINER_CLEANUP

2012-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535323#comment-13535323
 ] 

Hadoop QA commented on MAPREDUCE-4833:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12561562/MAPREDUCE4833-23.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3137//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3137//console

This message is automatically generated.

 Task can get stuck in FAIL_CONTAINER_CLEANUP
 

 Key: MAPREDUCE-4833
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4833
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.5
Reporter: Robert Joseph Evans
Assignee: Robert Parker
Priority: Critical
 Attachments: MAPREDUCE4833-23.patch


 If an NM goes down and the AM still tries to launch a container on it the 
 ContainerLauncherImpl can get stuck in an RPC timeout.  At the same time the 
 RM may notice that the NM has gone away and inform the AM of this, this 
 triggers a TA_FAILMSG.  If the TA_FAILMSG arrives at the TaskAttemptImpl 
 before the TA_CONTAINER_LAUNCH_FAILED message then the task attempt will try 
 to kill the container, but the ContainerLauncherImpl will not send back a 
 TA_CONTAINER_CLEANED event causing the attempt to be stuck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira