[
https://issues.apache.org/jira/browse/MAPREDUCE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415567#comment-13415567
]
Jason Lowe commented on MAPREDUCE-4448:
---------------------------------------
Log from one of the crashes shown below. Note the error during log aggregation
init on app startup that later leads to a fatal error when the app finishes.
{noformat}
[main]2012-07-13 20:35:21,019 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
Start request for container_1342210962593_0007_01_000001 by user x
[IPC Server handler 0 on 8041]2012-07-13 20:35:21,043 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
Creating a new application reference for app application_1342210962593_0007
[IPC Server handler 0 on 8041]2012-07-13 20:35:21,050 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Application application_1342210962593_0007 transitioned from NEW to INITING
[AsyncDispatcher event handler]2012-07-13 20:35:21,051 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Adding container_1342210962593_0007_01_000001 to application
application_1342210962593_0007
[AsyncDispatcher event handler]2012-07-13 20:35:21,062 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:x (auth:SIMPLE) cause:javax.security.sasl.SaslException: GSS initiate failed
[Caused by GSSException: No valid credentials provided (Mechanism level: Failed
to find any Kerberos tgt)]
[AsyncDispatcher event handler]2012-07-13 20:35:21,063 WARN
org.apache.hadoop.ipc.Client: Exception encountered while connecting to the
server : javax.security.sasl.SaslException: GSS initiate failed [Caused by
GSSException: No valid credentials provided (Mechanism level: Failed to find
any Kerberos tgt)]
[AsyncDispatcher event handler]2012-07-13 20:35:21,063 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:x (auth:SIMPLE) cause:java.io.IOException:
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException:
No valid credentials provided (Mechanism level: Failed to find any Kerberos
tgt)]
[AsyncDispatcher event handler]2012-07-13 20:35:21,063 ERROR
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
Failed to create user dir
[hdfs://xx:8020/mapred/logs/x] while processing app
application_1342210962593_0007
[AsyncDispatcher event handler]2012-07-13 20:35:21,064 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:x (auth:SIMPLE) cause:java.io.IOException: Failed on local exception:
java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed
[Caused by GSSException: No valid credentials provided (Mechanism level: Failed
to find any Kerberos tgt)]; Host Details : local host is: "xx/xx.xx.xx.xx";
destination host is: ""x":8020;
[AsyncDispatcher event handler]2012-07-13 20:35:21,065 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Application application_1342210962593_0007 transitioned from INITING to
FINISHING_CONTAINERS_WAIT
[AsyncDispatcher event handler]2012-07-13 20:35:21,067 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1342210962593_0007_01_000001 transitioned from NEW to DONE
[AsyncDispatcher event handler]2012-07-13 20:35:21,067 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Removing container_1342210962593_0007_01_000001 from application
application_1342210962593_0007
[AsyncDispatcher event handler]2012-07-13 20:35:21,069 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Application application_1342210962593_0007 transitioned from
FINISHING_CONTAINERS_WAIT to APPLICATION_RESOURCES_CLEANINGUP
[AsyncDispatcher event handler]2012-07-13 20:35:21,070 FATAL
org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
[AsyncDispatcher event handler]org.apache.hadoop.yarn.YarnException:
Application is not initialized yet for container_1342210962593_0007_01_000001
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.stopContainer(LogAggregationService.java:347)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:381)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:65)
at
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
at
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
at java.lang.Thread.run(Thread.java:619)
2012-07-13 20:35:21,071 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
Exiting, bbye..
[AsyncDispatcher event handler]2012-07-13 20:35:21,072 WARN
org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher thread interrupted
[AsyncDispatcher event handler]java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1961)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1996)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
at
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:69)
at java.lang.Thread.run(Thread.java:619)
2012-07-13 20:35:21,072 INFO org.apache.hadoop.yarn.service.AbstractService:
Service:Dispatcher is stopped.
[Thread-1]2012-07-13 20:35:21,073 INFO org.mortbay.log: Stopped
[email protected]:8042
[Thread-1]2012-07-13 20:35:21,075 INFO
org.apache.hadoop.yarn.service.AbstractService:
Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is stopped.
[Thread-1]2012-07-13 20:35:21,075 INFO org.apache.hadoop.ipc.Server: Stopping
server on 8041
[Thread-1]2012-07-13 20:35:21,076 INFO org.apache.hadoop.ipc.Server: Stopping
IPC Server listener on 8041
[IPC Server listener on 8041]2012-07-13 20:35:21,077 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
waiting for pending aggregation during exit
[Thread-1]2012-07-13 20:35:21,077 INFO org.apache.hadoop.ipc.Server: Stopping
IPC Server Responder
[IPC Server Responder]2012-07-13 20:35:21,077 INFO
org.apache.hadoop.yarn.service.AbstractService:
Service:org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
is stopped.
{noformat}
The problem is that one application with a bad token can bring down every
nodemanager that ran a container for it. MAPREDUCE-4302 fixed a similar crash
when log aggregation failed to start, but it missed this crash in the cleanup
case.
> Nodemanager crashes upon application cleanup if aggregation failed to start
> ---------------------------------------------------------------------------
>
> Key: MAPREDUCE-4448
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4448
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2, nodemanager
> Affects Versions: 0.23.3, 2.0.1-alpha
> Reporter: Jason Lowe
> Assignee: Jason Lowe
> Priority: Critical
>
> When log aggregation is enabled, the nodemanager can crash if log aggregation
> for an application failed to start.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira