NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
----------------------------------------------------------------------------

                 Key: MAPREDUCE-3738
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mrv2, nodemanager
    Affects Versions: 0.23.1, 0.24.0
            Reporter: Jason Lowe


If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception like 
OutOfMemoryError in the case I saw) then this will lead to a hang during 
nodemanager shutdown.  The NM calls AppLogAggregatorImpl.join() during shutdown 
to make sure log aggregation has completed, and that method internally waits 
for an atomic boolean to be set by the log aggregation thread to indicate it 
has finished.  Since the thread was killed off earlier due to an uncaught 
exception, the boolean will never be set and the NM hangs during shutdown 
repeating something like this every second in the log file:

2012-01-25 22:20:56,366 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
 Waiting for aggregation to complete for application_1326848182580_2806

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to