[jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly

2012-02-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215581#comment-13215581
 ] 

Hudson commented on MAPREDUCE-3738:
---

Integrated in Hadoop-Hdfs-0.23-Build #178 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/178/])
merge MAPREDUCE-3738 from trunk (Revision 1293061)

 Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1293061
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


 NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
 

 Key: MAPREDUCE-3738
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.1, 0.24.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Fix For: 0.23.2

 Attachments: MAPREDUCE-3738.patch, livehistdump.txt


 If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception 
 like OutOfMemoryError in the case I saw) then this will lead to a hang during 
 nodemanager shutdown.  The NM calls AppLogAggregatorImpl.join() during 
 shutdown to make sure log aggregation has completed, and that method 
 internally waits for an atomic boolean to be set by the log aggregation 
 thread to indicate it has finished.  Since the thread was killed off earlier 
 due to an uncaught exception, the boolean will never be set and the NM hangs 
 during shutdown repeating something like this every second in the log file:
 2012-01-25 22:20:56,366 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
  Waiting for aggregation to complete for application_1326848182580_2806

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly

2012-02-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215612#comment-13215612
 ] 

Hudson commented on MAPREDUCE-3738:
---

Integrated in Hadoop-Mapreduce-0.23-Build #206 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/206/])
merge MAPREDUCE-3738 from trunk (Revision 1293061)

 Result = FAILURE
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1293061
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


 NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
 

 Key: MAPREDUCE-3738
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.1, 0.24.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Fix For: 0.23.2

 Attachments: MAPREDUCE-3738.patch, livehistdump.txt


 If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception 
 like OutOfMemoryError in the case I saw) then this will lead to a hang during 
 nodemanager shutdown.  The NM calls AppLogAggregatorImpl.join() during 
 shutdown to make sure log aggregation has completed, and that method 
 internally waits for an atomic boolean to be set by the log aggregation 
 thread to indicate it has finished.  Since the thread was killed off earlier 
 due to an uncaught exception, the boolean will never be set and the NM hangs 
 during shutdown repeating something like this every second in the log file:
 2012-01-25 22:20:56,366 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
  Waiting for aggregation to complete for application_1326848182580_2806

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly

2012-02-23 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215077#comment-13215077
 ] 

Hadoop QA commented on MAPREDUCE-3738:
--

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12515804/MAPREDUCE-3738.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1917//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1917//console

This message is automatically generated.

 NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
 

 Key: MAPREDUCE-3738
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.1, 0.24.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Attachments: MAPREDUCE-3738.patch, livehistdump.txt


 If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception 
 like OutOfMemoryError in the case I saw) then this will lead to a hang during 
 nodemanager shutdown.  The NM calls AppLogAggregatorImpl.join() during 
 shutdown to make sure log aggregation has completed, and that method 
 internally waits for an atomic boolean to be set by the log aggregation 
 thread to indicate it has finished.  Since the thread was killed off earlier 
 due to an uncaught exception, the boolean will never be set and the NM hangs 
 during shutdown repeating something like this every second in the log file:
 2012-01-25 22:20:56,366 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
  Waiting for aggregation to complete for application_1326848182580_2806

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly

2012-02-23 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215320#comment-13215320
 ] 

Hudson commented on MAPREDUCE-3738:
---

Integrated in Hadoop-Common-0.23-Commit #587 (See 
[https://builds.apache.org/job/Hadoop-Common-0.23-Commit/587/])
merge MAPREDUCE-3738 from trunk (Revision 1293061)

 Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1293061
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


 NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
 

 Key: MAPREDUCE-3738
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.1, 0.24.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Fix For: 0.23.2

 Attachments: MAPREDUCE-3738.patch, livehistdump.txt


 If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception 
 like OutOfMemoryError in the case I saw) then this will lead to a hang during 
 nodemanager shutdown.  The NM calls AppLogAggregatorImpl.join() during 
 shutdown to make sure log aggregation has completed, and that method 
 internally waits for an atomic boolean to be set by the log aggregation 
 thread to indicate it has finished.  Since the thread was killed off earlier 
 due to an uncaught exception, the boolean will never be set and the NM hangs 
 during shutdown repeating something like this every second in the log file:
 2012-01-25 22:20:56,366 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
  Waiting for aggregation to complete for application_1326848182580_2806

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly

2012-02-23 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215325#comment-13215325
 ] 

Hudson commented on MAPREDUCE-3738:
---

Integrated in Hadoop-Hdfs-0.23-Commit #574 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/574/])
merge MAPREDUCE-3738 from trunk (Revision 1293061)

 Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1293061
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


 NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
 

 Key: MAPREDUCE-3738
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.1, 0.24.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Fix For: 0.23.2

 Attachments: MAPREDUCE-3738.patch, livehistdump.txt


 If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception 
 like OutOfMemoryError in the case I saw) then this will lead to a hang during 
 nodemanager shutdown.  The NM calls AppLogAggregatorImpl.join() during 
 shutdown to make sure log aggregation has completed, and that method 
 internally waits for an atomic boolean to be set by the log aggregation 
 thread to indicate it has finished.  Since the thread was killed off earlier 
 due to an uncaught exception, the boolean will never be set and the NM hangs 
 during shutdown repeating something like this every second in the log file:
 2012-01-25 22:20:56,366 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
  Waiting for aggregation to complete for application_1326848182580_2806

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly

2012-02-23 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215335#comment-13215335
 ] 

Hudson commented on MAPREDUCE-3738:
---

Integrated in Hadoop-Mapreduce-0.23-Commit #589 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/589/])
merge MAPREDUCE-3738 from trunk (Revision 1293061)

 Result = ABORTED
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1293061
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


 NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
 

 Key: MAPREDUCE-3738
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.1, 0.24.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Fix For: 0.23.2

 Attachments: MAPREDUCE-3738.patch, livehistdump.txt


 If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception 
 like OutOfMemoryError in the case I saw) then this will lead to a hang during 
 nodemanager shutdown.  The NM calls AppLogAggregatorImpl.join() during 
 shutdown to make sure log aggregation has completed, and that method 
 internally waits for an atomic boolean to be set by the log aggregation 
 thread to indicate it has finished.  Since the thread was killed off earlier 
 due to an uncaught exception, the boolean will never be set and the NM hangs 
 during shutdown repeating something like this every second in the log file:
 2012-01-25 22:20:56,366 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
  Waiting for aggregation to complete for application_1326848182580_2806

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly

2012-01-26 Thread Mahadev konar (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194196#comment-13194196
 ] 

Mahadev konar commented on MAPREDUCE-3738:
--

Jason,
 Is there a bug for OOM? What was the reason for that?

 NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
 

 Key: MAPREDUCE-3738
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.1, 0.24.0
Reporter: Jason Lowe
Priority: Critical

 If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception 
 like OutOfMemoryError in the case I saw) then this will lead to a hang during 
 nodemanager shutdown.  The NM calls AppLogAggregatorImpl.join() during 
 shutdown to make sure log aggregation has completed, and that method 
 internally waits for an atomic boolean to be set by the log aggregation 
 thread to indicate it has finished.  Since the thread was killed off earlier 
 due to an uncaught exception, the boolean will never be set and the NM hangs 
 during shutdown repeating something like this every second in the log file:
 2012-01-25 22:20:56,366 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
  Waiting for aggregation to complete for application_1326848182580_2806

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly

2012-01-26 Thread Vinod Kumar Vavilapalli (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194205#comment-13194205
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-3738:


Originally when I wrote this, I had the same suspicion about the join. But 
later, I made sure all exceptions were caught and that the boolean gets set in 
all possible cases. OOM/errors are one thing that didn't occur to me.

Can you debug as to why you ran into OOM ? We need to fix that definitely, 
irrespective of how we want to handle other errors.

 NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
 

 Key: MAPREDUCE-3738
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.1, 0.24.0
Reporter: Jason Lowe
Priority: Critical

 If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception 
 like OutOfMemoryError in the case I saw) then this will lead to a hang during 
 nodemanager shutdown.  The NM calls AppLogAggregatorImpl.join() during 
 shutdown to make sure log aggregation has completed, and that method 
 internally waits for an atomic boolean to be set by the log aggregation 
 thread to indicate it has finished.  Since the thread was killed off earlier 
 due to an uncaught exception, the boolean will never be set and the NM hangs 
 during shutdown repeating something like this every second in the log file:
 2012-01-25 22:20:56,366 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
  Waiting for aggregation to complete for application_1326848182580_2806

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly

2012-01-26 Thread Jason Lowe (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194204#comment-13194204
 ] 

Jason Lowe commented on MAPREDUCE-3738:
---

No bug for the OOM yet, unfortunately cluster was re-deployed before grabbing a 
full heap dump.  I do have the jmap -hist:live output from one of the 
nodemanagers but haven't had a chance to go through it yet to see if it helps 
pinpoint where the leak would be.

 NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
 

 Key: MAPREDUCE-3738
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.1, 0.24.0
Reporter: Jason Lowe
Priority: Critical

 If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception 
 like OutOfMemoryError in the case I saw) then this will lead to a hang during 
 nodemanager shutdown.  The NM calls AppLogAggregatorImpl.join() during 
 shutdown to make sure log aggregation has completed, and that method 
 internally waits for an atomic boolean to be set by the log aggregation 
 thread to indicate it has finished.  Since the thread was killed off earlier 
 due to an uncaught exception, the boolean will never be set and the NM hangs 
 during shutdown repeating something like this every second in the log file:
 2012-01-25 22:20:56,366 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
  Waiting for aggregation to complete for application_1326848182580_2806

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly

2012-01-26 Thread Vinod Kumar Vavilapalli (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194207#comment-13194207
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-3738:


Comment race;) Even the stack trace during OOM will help.

 NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
 

 Key: MAPREDUCE-3738
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.1, 0.24.0
Reporter: Jason Lowe
Priority: Critical

 If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception 
 like OutOfMemoryError in the case I saw) then this will lead to a hang during 
 nodemanager shutdown.  The NM calls AppLogAggregatorImpl.join() during 
 shutdown to make sure log aggregation has completed, and that method 
 internally waits for an atomic boolean to be set by the log aggregation 
 thread to indicate it has finished.  Since the thread was killed off earlier 
 due to an uncaught exception, the boolean will never be set and the NM hangs 
 during shutdown repeating something like this every second in the log file:
 2012-01-25 22:20:56,366 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
  Waiting for aggregation to complete for application_1326848182580_2806

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira