AM Container exits with code 2
Hi all, I have launched an application on yarn cluster which has following config. Master (Resource Manager) - 16GB RAM + 8 vCPU Slave 1 (Node manager 1) - 8GB RAM + 4 vCPU Intermittently AM(2GB, 1 core) is exiting with code - 2 with the following trace. I am not able to find anything about exit code 2. Last log is org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 22504 for container-id container_1469709900068_0002_01_01: 203.8 MB of 2 GB physical memory used; 2.8 GB of 4.2 GB virtual memory used Does this have anything to do with my application logic or Is it possible that it is killed because of exceeding the memory limits? 2016-07-28 17:08:50,672 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1469709900068_0002_01_01 and exit code: 2 ExitCodeException exitCode=2: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2016-07-28 17:08:50,674 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch. 2016-07-28 17:08:50,674 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1469709900068_0002_01_01 2016-07-28 17:08:50,674 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 2 2016-07-28 17:08:50,674 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=2: Thanks, Rahul Chhiber
Node Manager crashes with OutOfMemory error
Hi All, I am running a Hadoop cluster with following configuration :- Master (Resource Manager) - 16GB RAM + 8 vCPU Slave 1 (Node manager 1) - 8GB RAM + 4 vCPU Slave 2 (Node manager 2) - 8GB RAM + 4 vCPU Memory allocated for container use per slave i.e. yarn.nodemanager.resource.memory-mb is 6144. When I launch an application, container allocation and execution is successful, but after executing 1 or 2 jobs on the cluster, either one or both the node manager daemons crash with the following error in logs :- "java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2367) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415) at java.lang.StringBuffer.append(StringBuffer.java:237) at org.apache.hadoop.util.Shell$1.run(Shell.java:511) 2016-07-22 06:54:54,326 INFO org.apache.hadoop.util.ExitUtil: Halt with status -1 Message: HaltException" We have allocated 1 GB of heap space for each node manager daemon. On average there are about 3 containers running on 1 slave node. We have been running Hadoop clusters for a while now, but haven't faced this issue until recently. What are the memory sizing recommendations for Nodemanager ? As per my understanding, the memory used by containers or by the Application master should not have any bearing on Node manager memory consumption, as they all run in separate JVMs. What could be the possible reasons for high memory consumption for the Node Manager? NOTE :- I tried allocating more heap memory for Node manager (2 GB), but issue still occurs intermittently. Containers getting killed due to excess memory consumption is understandable but if Node manager crashes in this manner it would be a serious scalability problem. Thanks, Rahul Chhiber
RE: Application Master fails due to Invalid AMRM token
Hi Xuan, I applied the patch for YARN-3103 and the issue hasn't occurred since. Thanks for your help! :) Regards, Rahul Chhiber From: Xuan Gong [mailto:xg...@hortonworks.com] Sent: Friday, February 06, 2015 5:53 AM To: user@hadoop.apache.org Subject: Re: Application Master fails due to Invalid AMRM token Hey, Rahul Can be related to: https://issues.apache.org/jira/browse/MAPREDUCE-6230 and https://issues.apache.org/jira/browse/YARN-3103 You can manually apply these two patches, and try again. Thanks Xuan Gong From: bharath vissapragada bharathvissapragada1...@gmail.commailto:bharathvissapragada1...@gmail.com Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org user@hadoop.apache.orgmailto:user@hadoop.apache.org Date: Wednesday, February 4, 2015 at 11:40 PM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: Re: Application Master fails due to Invalid AMRM token Might be this https://issues.apache.org/jira/browse/YARN-2964. This bug was injected due https://issues.apache.org/jira/browse/YARN-2704. On Thu, Feb 5, 2015 at 1:04 PM, Rahul Chhiber rahul.chhi...@cumulus-systems.commailto:rahul.chhi...@cumulus-systems.com wrote: Hi all, I am running a Hadoop cluster of 4 nodes (Hadoop 2.6). I am facing an irregular error, relating to an Invalid AMRM token that causes my YARN application to crash anytime from 1 day up to a week after starting. Application launches successfully and runs for a period of time which is not predictable, then crashes with the following logs (Appmaster.stdout). When I restart, application works perfectly for some time before crashing again with the same error. 2015-02-04 12:55:51,394 [AMRM Heartbeater thread] org.apache.hadoop.ipc.Client-DEBUG-The ping interval is 6 ms. 2015-02-04 12:55:51,394 [AMRM Heartbeater thread] org.apache.hadoop.ipc.Client-DEBUG-Connecting to masternode/192.168.143.23:8030http://192.168.143.23:8030 2015-02-04 12:55:51,396 [AMRM Heartbeater thread] org.apache.hadoop.security.UserGroupInformation-DEBUG-PrivilegedAction as:user1 (auth:SIMPLE) from:org.ap ache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:717) 2015-02-04 12:55:51,396 [AMRM Heartbeater thread] org.apache.hadoop.security.SaslRpcClient-DEBUG-Sending sasl message state: NEGOTIATE 2015-02-04 12:55:51,397 [AMRM Heartbeater thread] org.apache.hadoop.security.SaslRpcClient-DEBUG-Received SASL message state: NEGOTIATE auths { method: TOKEN mechanism: DIGEST-MD5 protocol: serverId: default challenge: realm=\default\,nonce=\FjsVVqgBotmE1OIpCE6f/KVmiuM3ixIolXg/l5et\,qop=\auth\,charset=utf-8,algorithm=md5-sess } 2015-02-04 12:55:51,397 [AMRM Heartbeater thread] org.apache.hadoop.security.SaslRpcClient-DEBUG-Get token info proto:interface org.apache.hadoop.yarn.api. ApplicationMasterProtocolPB info:org.apache.hadoop.yarn.security.SchedulerSecurityInfo$1@4ae70093 2015-02-04 12:55:51,397 [AMRM Heartbeater thread] org.apache.hadoop.yarn.security.AMRMTokenSelector-DEBUG-Looking for a token with service 192.168.143.23:8http://192.168.143.23:8 030 2015-02-04 12:55:51,397 [AMRM Heartbeater thread] org.apache.hadoop.yarn.security.AMRMTokenSelector-DEBUG-Token kind is YARN_AM_RM_TOKEN and the token's service name is 192.168.143.23:8030http://192.168.143.23:8030 2015-02-04 12:55:51,397 [AMRM Heartbeater thread] org.apache.hadoop.security.SaslRpcClient-DEBUG-Creating SASL DIGEST-MD5(TOKEN) client to authenticate to service at default 2015-02-04 12:55:51,398 [AMRM Heartbeater thread] org.apache.hadoop.security.SaslRpcClient-DEBUG-Use TOKEN authentication for protocol ApplicationMasterProtocolPB 2015-02-04 12:55:51,398 [AMRM Heartbeater thread] org.apache.hadoop.security.SaslRpcClient-DEBUG-SASL client callback: setting username: Cg0KCQgBEM3bh860KRACEMyF5q/9/wE= 2015-02-04 12:55:51,398 [AMRM Heartbeater thread] org.apache.hadoop.security.SaslRpcClient-DEBUG-SASL client callback: setting userPassword 2015-02-04 12:55:51,398 [AMRM Heartbeater thread] org.apache.hadoop.security.SaslRpcClient-DEBUG-SASL client callback: setting realm: default 2015-02-04 12:55:51,398 [AMRM Heartbeater thread] org.apache.hadoop.security.SaslRpcClient-DEBUG-Sending sasl message state: INITIATE token: charset=utf-8,username=\Cg0KCQgBEM3bh860KRACEMyF5q/9/wE=\,realm=\default\,nonce=\FjsVVqgBotmE1OIpCE6f/KVmiuM3ixIolXg/l5et\,nc=0001,cnonce=\dETUXCWNF7Aw2ZReFDsKF5jj9bKcyvoQYEJDU9N5\,digest-uri=\/default\,maxbuf=65536,response=1bdc86e7222c86e0692d30db6bec1479,qop=auth auths { method: TOKEN mechanism: DIGEST-MD5 protocol: serverId: default } 2015-02-04 12:55:51,400 [AMRM Heartbeater thread] org.apache.hadoop.security.UserGroupInformation-DEBUG-PrivilegedActionException as:user1 (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Invalid AMRMToken from appattempt_1422871621069_0001_02 2015-02-04 12:55
Application Master fails due to Invalid AMRM token
) at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:274) 2015-02-04 12:55:51,423 [AMRM Callback Handler Thread] org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl-ERROR-Stopping callback due to: org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid AMRMToken from appattempt_1422871621069_0001_02 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79) at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy12.allocate(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:333) at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:224) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Invalid AMRMToken from appattempt_1422871621069_0001_02 at org.apache.hadoop.ipc.Client.call(Client.java:1468) at org.apache.hadoop.ipc.Client.call(Client.java:1399) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy11.allocate(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) ... 8 more 2015-02-04 12:55:51,424 [AMRM Callback Handler Thread] org.apache.hadoop.service.AbstractService-DEBUG-Service: org.apache.hadoop.yarn.client.api.async.AMRMClientAsync entered state STOPPED 2015-02-04 12:55:51,424 [AMRM Callback Handler Thread] org.apache.hadoop.service.AbstractService-DEBUG-Service: org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl entered state STOPPED 2015-02-04 12:55:51,424 [AMRM Callback Handler Thread] org.apache.hadoop.ipc.Client-DEBUG-stopping client from cache: org.apache.hadoop.ipc.Client@552d7308 2015-02-04 12:56:03,544 [org.eclipse.jetty.server.session.HashSessionManager@425af55fTimer] org.eclipse.jetty.server.session-DEBUG-Scavenging sessions at 1423054563544 Any help is greatly appreciated. Thanks, Rahul Chhiber
RE: How to handle Container crash in YARN
Sajid, Check the logs for your container at $HADOOP_INSTALL_DIR/logs/userlogs/application_id/container_id. Note that these will be present on the node where your Application Master is running. If the container was not able to start, you might get something by printing the stack trace in onStartContainerError(ContainerId containerId, Throwable t) callback method in the NMClientAsync.CallbackHandler interface. You should always be capturing the exit status of the container inside onContainersCompleted(ListContainerStatus completedContainers) callback method of AMRMClientAsync.CallbackHandler interface. Please see the source of Distributed Shell application on github for an example of how this is done - https://github.com/apache/hadoop-common/tree/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell Regards, Rahul Chhiber From: Sajid Syed [mailto:sajid...@gmail.com] Sent: Thursday, December 18, 2014 9:44 AM To: user@hadoop.apache.org Subject: How to handle Container crash in YARN Hello, Can any please explain me how to handle/Resolve the Container crash in YARN Hadoop. Thanks Sajid Syed