[jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly

2012-02-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215612#comment-13215612
 ] 

Hudson commented on MAPREDUCE-3738:
---

Integrated in Hadoop-Mapreduce-0.23-Build #206 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/206/])
merge MAPREDUCE-3738 from trunk (Revision 1293061)

 Result = FAILURE
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1293061
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


> NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
> 
>
> Key: MAPREDUCE-3738
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2, nodemanager
>Affects Versions: 0.23.1, 0.24.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Fix For: 0.23.2
>
> Attachments: MAPREDUCE-3738.patch, livehistdump.txt
>
>
> If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception 
> like OutOfMemoryError in the case I saw) then this will lead to a hang during 
> nodemanager shutdown.  The NM calls AppLogAggregatorImpl.join() during 
> shutdown to make sure log aggregation has completed, and that method 
> internally waits for an atomic boolean to be set by the log aggregation 
> thread to indicate it has finished.  Since the thread was killed off earlier 
> due to an uncaught exception, the boolean will never be set and the NM hangs 
> during shutdown repeating something like this every second in the log file:
> 2012-01-25 22:20:56,366 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
>  Waiting for aggregation to complete for application_1326848182580_2806

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly

2012-02-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215581#comment-13215581
 ] 

Hudson commented on MAPREDUCE-3738:
---

Integrated in Hadoop-Hdfs-0.23-Build #178 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/178/])
merge MAPREDUCE-3738 from trunk (Revision 1293061)

 Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1293061
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


> NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
> 
>
> Key: MAPREDUCE-3738
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2, nodemanager
>Affects Versions: 0.23.1, 0.24.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Fix For: 0.23.2
>
> Attachments: MAPREDUCE-3738.patch, livehistdump.txt
>
>
> If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception 
> like OutOfMemoryError in the case I saw) then this will lead to a hang during 
> nodemanager shutdown.  The NM calls AppLogAggregatorImpl.join() during 
> shutdown to make sure log aggregation has completed, and that method 
> internally waits for an atomic boolean to be set by the log aggregation 
> thread to indicate it has finished.  Since the thread was killed off earlier 
> due to an uncaught exception, the boolean will never be set and the NM hangs 
> during shutdown repeating something like this every second in the log file:
> 2012-01-25 22:20:56,366 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
>  Waiting for aggregation to complete for application_1326848182580_2806

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly

2012-02-23 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215335#comment-13215335
 ] 

Hudson commented on MAPREDUCE-3738:
---

Integrated in Hadoop-Mapreduce-0.23-Commit #589 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/589/])
merge MAPREDUCE-3738 from trunk (Revision 1293061)

 Result = ABORTED
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1293061
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


> NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
> 
>
> Key: MAPREDUCE-3738
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2, nodemanager
>Affects Versions: 0.23.1, 0.24.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Fix For: 0.23.2
>
> Attachments: MAPREDUCE-3738.patch, livehistdump.txt
>
>
> If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception 
> like OutOfMemoryError in the case I saw) then this will lead to a hang during 
> nodemanager shutdown.  The NM calls AppLogAggregatorImpl.join() during 
> shutdown to make sure log aggregation has completed, and that method 
> internally waits for an atomic boolean to be set by the log aggregation 
> thread to indicate it has finished.  Since the thread was killed off earlier 
> due to an uncaught exception, the boolean will never be set and the NM hangs 
> during shutdown repeating something like this every second in the log file:
> 2012-01-25 22:20:56,366 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
>  Waiting for aggregation to complete for application_1326848182580_2806

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly

2012-02-23 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215325#comment-13215325
 ] 

Hudson commented on MAPREDUCE-3738:
---

Integrated in Hadoop-Hdfs-0.23-Commit #574 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/574/])
merge MAPREDUCE-3738 from trunk (Revision 1293061)

 Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1293061
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


> NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
> 
>
> Key: MAPREDUCE-3738
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2, nodemanager
>Affects Versions: 0.23.1, 0.24.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Fix For: 0.23.2
>
> Attachments: MAPREDUCE-3738.patch, livehistdump.txt
>
>
> If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception 
> like OutOfMemoryError in the case I saw) then this will lead to a hang during 
> nodemanager shutdown.  The NM calls AppLogAggregatorImpl.join() during 
> shutdown to make sure log aggregation has completed, and that method 
> internally waits for an atomic boolean to be set by the log aggregation 
> thread to indicate it has finished.  Since the thread was killed off earlier 
> due to an uncaught exception, the boolean will never be set and the NM hangs 
> during shutdown repeating something like this every second in the log file:
> 2012-01-25 22:20:56,366 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
>  Waiting for aggregation to complete for application_1326848182580_2806

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly

2012-02-23 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215320#comment-13215320
 ] 

Hudson commented on MAPREDUCE-3738:
---

Integrated in Hadoop-Common-0.23-Commit #587 (See 
[https://builds.apache.org/job/Hadoop-Common-0.23-Commit/587/])
merge MAPREDUCE-3738 from trunk (Revision 1293061)

 Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1293061
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


> NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
> 
>
> Key: MAPREDUCE-3738
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2, nodemanager
>Affects Versions: 0.23.1, 0.24.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Fix For: 0.23.2
>
> Attachments: MAPREDUCE-3738.patch, livehistdump.txt
>
>
> If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception 
> like OutOfMemoryError in the case I saw) then this will lead to a hang during 
> nodemanager shutdown.  The NM calls AppLogAggregatorImpl.join() during 
> shutdown to make sure log aggregation has completed, and that method 
> internally waits for an atomic boolean to be set by the log aggregation 
> thread to indicate it has finished.  Since the thread was killed off earlier 
> due to an uncaught exception, the boolean will never be set and the NM hangs 
> during shutdown repeating something like this every second in the log file:
> 2012-01-25 22:20:56,366 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
>  Waiting for aggregation to complete for application_1326848182580_2806

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly

2012-02-23 Thread Siddharth Seth (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215310#comment-13215310
 ] 

Siddharth Seth commented on MAPREDUCE-3738:
---

+1. Looks good.

> NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
> 
>
> Key: MAPREDUCE-3738
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2, nodemanager
>Affects Versions: 0.23.1, 0.24.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: MAPREDUCE-3738.patch, livehistdump.txt
>
>
> If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception 
> like OutOfMemoryError in the case I saw) then this will lead to a hang during 
> nodemanager shutdown.  The NM calls AppLogAggregatorImpl.join() during 
> shutdown to make sure log aggregation has completed, and that method 
> internally waits for an atomic boolean to be set by the log aggregation 
> thread to indicate it has finished.  Since the thread was killed off earlier 
> due to an uncaught exception, the boolean will never be set and the NM hangs 
> during shutdown repeating something like this every second in the log file:
> 2012-01-25 22:20:56,366 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
>  Waiting for aggregation to complete for application_1326848182580_2806

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly

2012-02-23 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215077#comment-13215077
 ] 

Hadoop QA commented on MAPREDUCE-3738:
--

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12515804/MAPREDUCE-3738.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1917//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1917//console

This message is automatically generated.

> NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
> 
>
> Key: MAPREDUCE-3738
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2, nodemanager
>Affects Versions: 0.23.1, 0.24.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: MAPREDUCE-3738.patch, livehistdump.txt
>
>
> If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception 
> like OutOfMemoryError in the case I saw) then this will lead to a hang during 
> nodemanager shutdown.  The NM calls AppLogAggregatorImpl.join() during 
> shutdown to make sure log aggregation has completed, and that method 
> internally waits for an atomic boolean to be set by the log aggregation 
> thread to indicate it has finished.  Since the thread was killed off earlier 
> due to an uncaught exception, the boolean will never be set and the NM hangs 
> during shutdown repeating something like this every second in the log file:
> 2012-01-25 22:20:56,366 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
>  Waiting for aggregation to complete for application_1326848182580_2806

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly

2012-01-26 Thread Vinod Kumar Vavilapalli (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194207#comment-13194207
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-3738:


Comment race;) Even the stack trace during OOM will help.

> NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
> 
>
> Key: MAPREDUCE-3738
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2, nodemanager
>Affects Versions: 0.23.1, 0.24.0
>Reporter: Jason Lowe
>Priority: Critical
>
> If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception 
> like OutOfMemoryError in the case I saw) then this will lead to a hang during 
> nodemanager shutdown.  The NM calls AppLogAggregatorImpl.join() during 
> shutdown to make sure log aggregation has completed, and that method 
> internally waits for an atomic boolean to be set by the log aggregation 
> thread to indicate it has finished.  Since the thread was killed off earlier 
> due to an uncaught exception, the boolean will never be set and the NM hangs 
> during shutdown repeating something like this every second in the log file:
> 2012-01-25 22:20:56,366 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
>  Waiting for aggregation to complete for application_1326848182580_2806

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly

2012-01-26 Thread Jason Lowe (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194204#comment-13194204
 ] 

Jason Lowe commented on MAPREDUCE-3738:
---

No bug for the OOM yet, unfortunately cluster was re-deployed before grabbing a 
full heap dump.  I do have the jmap -hist:live output from one of the 
nodemanagers but haven't had a chance to go through it yet to see if it helps 
pinpoint where the leak would be.

> NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
> 
>
> Key: MAPREDUCE-3738
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2, nodemanager
>Affects Versions: 0.23.1, 0.24.0
>Reporter: Jason Lowe
>Priority: Critical
>
> If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception 
> like OutOfMemoryError in the case I saw) then this will lead to a hang during 
> nodemanager shutdown.  The NM calls AppLogAggregatorImpl.join() during 
> shutdown to make sure log aggregation has completed, and that method 
> internally waits for an atomic boolean to be set by the log aggregation 
> thread to indicate it has finished.  Since the thread was killed off earlier 
> due to an uncaught exception, the boolean will never be set and the NM hangs 
> during shutdown repeating something like this every second in the log file:
> 2012-01-25 22:20:56,366 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
>  Waiting for aggregation to complete for application_1326848182580_2806

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly

2012-01-26 Thread Vinod Kumar Vavilapalli (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194205#comment-13194205
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-3738:


Originally when I wrote this, I had the same suspicion about the join. But 
later, I made sure all exceptions were caught and that the boolean gets set in 
all possible cases. OOM/errors are one thing that didn't occur to me.

Can you debug as to why you ran into OOM ? We need to fix that definitely, 
irrespective of how we want to handle other errors.

> NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
> 
>
> Key: MAPREDUCE-3738
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2, nodemanager
>Affects Versions: 0.23.1, 0.24.0
>Reporter: Jason Lowe
>Priority: Critical
>
> If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception 
> like OutOfMemoryError in the case I saw) then this will lead to a hang during 
> nodemanager shutdown.  The NM calls AppLogAggregatorImpl.join() during 
> shutdown to make sure log aggregation has completed, and that method 
> internally waits for an atomic boolean to be set by the log aggregation 
> thread to indicate it has finished.  Since the thread was killed off earlier 
> due to an uncaught exception, the boolean will never be set and the NM hangs 
> during shutdown repeating something like this every second in the log file:
> 2012-01-25 22:20:56,366 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
>  Waiting for aggregation to complete for application_1326848182580_2806

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly

2012-01-26 Thread Mahadev konar (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194196#comment-13194196
 ] 

Mahadev konar commented on MAPREDUCE-3738:
--

Jason,
 Is there a bug for OOM? What was the reason for that?

> NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
> 
>
> Key: MAPREDUCE-3738
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2, nodemanager
>Affects Versions: 0.23.1, 0.24.0
>Reporter: Jason Lowe
>Priority: Critical
>
> If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception 
> like OutOfMemoryError in the case I saw) then this will lead to a hang during 
> nodemanager shutdown.  The NM calls AppLogAggregatorImpl.join() during 
> shutdown to make sure log aggregation has completed, and that method 
> internally waits for an atomic boolean to be set by the log aggregation 
> thread to indicate it has finished.  Since the thread was killed off earlier 
> due to an uncaught exception, the boolean will never be set and the NM hangs 
> during shutdown repeating something like this every second in the log file:
> 2012-01-25 22:20:56,366 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
>  Waiting for aggregation to complete for application_1326848182580_2806

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira