[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415567#comment-13415567
 ] 

Jason Lowe commented on MAPREDUCE-4448:
---------------------------------------

Log from one of the crashes shown below.  Note the error during log aggregation 
init on app startup that later leads to a fatal error when the app finishes.

{noformat}
[main]2012-07-13 20:35:21,019 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 Start request for container_1342210962593_0007_01_000001 by user x
[IPC Server handler 0 on 8041]2012-07-13 20:35:21,043 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 Creating a new application reference for app application_1342210962593_0007
[IPC Server handler 0 on 8041]2012-07-13 20:35:21,050 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
 Application application_1342210962593_0007 transitioned from NEW to INITING
[AsyncDispatcher event handler]2012-07-13 20:35:21,051 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
 Adding container_1342210962593_0007_01_000001 to application 
application_1342210962593_0007
[AsyncDispatcher event handler]2012-07-13 20:35:21,062 ERROR 
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
as:x (auth:SIMPLE) cause:javax.security.sasl.SaslException: GSS initiate failed 
[Caused by GSSException: No valid credentials provided (Mechanism level: Failed 
to find any Kerberos tgt)]
[AsyncDispatcher event handler]2012-07-13 20:35:21,063 WARN 
org.apache.hadoop.ipc.Client: Exception encountered while connecting to the 
server : javax.security.sasl.SaslException: GSS initiate failed [Caused by 
GSSException: No valid credentials provided (Mechanism level: Failed to find 
any Kerberos tgt)]
[AsyncDispatcher event handler]2012-07-13 20:35:21,063 ERROR 
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
as:x (auth:SIMPLE) cause:java.io.IOException: 
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]
[AsyncDispatcher event handler]2012-07-13 20:35:21,063 ERROR 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
 Failed to create user dir
[hdfs://xx:8020/mapred/logs/x] while processing app 
application_1342210962593_0007
[AsyncDispatcher event handler]2012-07-13 20:35:21,064 ERROR 
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
as:x (auth:SIMPLE) cause:java.io.IOException: Failed on local exception: 
java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed 
[Caused by GSSException: No valid credentials provided (Mechanism level: Failed 
to find any Kerberos tgt)]; Host Details : local host is: "xx/xx.xx.xx.xx"; 
destination host is: ""x":8020; 
[AsyncDispatcher event handler]2012-07-13 20:35:21,065 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
 Application application_1342210962593_0007 transitioned from INITING to 
FINISHING_CONTAINERS_WAIT
[AsyncDispatcher event handler]2012-07-13 20:35:21,067 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1342210962593_0007_01_000001 transitioned from NEW to DONE
[AsyncDispatcher event handler]2012-07-13 20:35:21,067 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
 Removing container_1342210962593_0007_01_000001 from application 
application_1342210962593_0007
[AsyncDispatcher event handler]2012-07-13 20:35:21,069 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
 Application application_1342210962593_0007 transitioned from 
FINISHING_CONTAINERS_WAIT to APPLICATION_RESOURCES_CLEANINGUP
[AsyncDispatcher event handler]2012-07-13 20:35:21,070 FATAL 
org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
[AsyncDispatcher event handler]org.apache.hadoop.yarn.YarnException:
Application is not initialized yet for container_1342210962593_0007_01_000001
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.stopContainer(LogAggregationService.java:347)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:381)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:65)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
        at java.lang.Thread.run(Thread.java:619)
2012-07-13 20:35:21,071 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
Exiting, bbye..
[AsyncDispatcher event handler]2012-07-13 20:35:21,072 WARN 
org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher thread interrupted
[AsyncDispatcher event handler]java.lang.InterruptedException
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1961)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1996)
        at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:69)
        at java.lang.Thread.run(Thread.java:619)
2012-07-13 20:35:21,072 INFO org.apache.hadoop.yarn.service.AbstractService: 
Service:Dispatcher is stopped.
[Thread-1]2012-07-13 20:35:21,073 INFO org.mortbay.log: Stopped 
SelectChannelConnector@0.0.0.0:8042
[Thread-1]2012-07-13 20:35:21,075 INFO 
org.apache.hadoop.yarn.service.AbstractService: 
Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is stopped.
[Thread-1]2012-07-13 20:35:21,075 INFO org.apache.hadoop.ipc.Server: Stopping 
server on 8041
[Thread-1]2012-07-13 20:35:21,076 INFO org.apache.hadoop.ipc.Server: Stopping 
IPC Server listener on 8041
[IPC Server listener on 8041]2012-07-13 20:35:21,077 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
 waiting for pending aggregation during exit
[Thread-1]2012-07-13 20:35:21,077 INFO org.apache.hadoop.ipc.Server: Stopping 
IPC Server Responder
[IPC Server Responder]2012-07-13 20:35:21,077 INFO 
org.apache.hadoop.yarn.service.AbstractService: 
Service:org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
 is stopped.
{noformat}

The problem is that one application with a bad token can bring down every 
nodemanager that ran a container for it.  MAPREDUCE-4302 fixed a similar crash 
when log aggregation failed to start, but it missed this crash in the cleanup 
case.
                
> Nodemanager crashes upon application cleanup if aggregation failed to start
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4448
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4448
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, nodemanager
>    Affects Versions: 0.23.3, 2.0.1-alpha
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>
> When log aggregation is enabled, the nodemanager can crash if log aggregation 
> for an application failed to start.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to