AM Container exits with code 2

2016-07-29 Thread Rahul Chhiber
Hi all,

I have launched an application on yarn cluster which has following config.
Master (Resource Manager) - 16GB RAM + 8 vCPU
Slave 1 (Node manager 1) - 8GB RAM + 4 vCPU

Intermittently AM(2GB, 1 core) is exiting with code - 2 with the following 
trace. I am not able to find anything about exit code 2.

Last log is
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Memory usage of ProcessTree 22504 for container-id 
container_1469709900068_0002_01_01: 203.8 MB of 2 GB physical memory used; 
2.8 GB of 4.2 GB virtual memory used

Does this have anything to do with my application logic or Is it possible that 
it is killed because of exceeding the memory limits?

2016-07-28 17:08:50,672 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception 
from container-launch with container ID: container_1469709900068_0002_01_01 
and exit code: 2
ExitCodeException exitCode=2:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2016-07-28 17:08:50,674 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from 
container-launch.
2016-07-28 17:08:50,674 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: 
container_1469709900068_0002_01_01
2016-07-28 17:08:50,674 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 2
2016-07-28 17:08:50,674 INFO 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: 
ExitCodeException exitCode=2:

Thanks,
Rahul Chhiber



Node Manager crashes with OutOfMemory error

2016-07-26 Thread Rahul Chhiber
Hi All,

I am running a Hadoop cluster with following configuration :-

Master (Resource Manager) - 16GB RAM + 8 vCPU
Slave 1 (Node manager 1) - 8GB RAM + 4 vCPU
Slave 2 (Node manager 2) - 8GB RAM + 4 vCPU

Memory allocated for container use per slave  i.e. 
yarn.nodemanager.resource.memory-mb is 6144.

When I launch an application, container allocation and execution is successful, 
but after executing 1 or 2 jobs on the cluster, either one or both the node 
manager daemons crash with the following error in logs :-

"java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2367)
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
at 
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
at java.lang.StringBuffer.append(StringBuffer.java:237)
at org.apache.hadoop.util.Shell$1.run(Shell.java:511)
2016-07-22 06:54:54,326 INFO org.apache.hadoop.util.ExitUtil: Halt with status 
-1 Message: HaltException"

We have allocated 1 GB of heap space for each node manager daemon. On average 
there are about 3 containers running on 1 slave node. We have been running 
Hadoop clusters for a while now, but haven't faced this issue until recently. 
What are the memory sizing recommendations for Nodemanager ? As per my 
understanding, the memory used by containers or by the Application master 
should not have any bearing on Node manager memory consumption, as they all run 
in separate JVMs. What could be the possible reasons for high memory 
consumption for the Node Manager?

NOTE :- I tried allocating more heap memory for Node manager (2 GB), but issue 
still occurs intermittently. Containers getting killed due to excess memory 
consumption is understandable but if Node manager crashes in this manner it 
would be a serious scalability problem.

Thanks,
Rahul Chhiber



RE: Application Master fails due to Invalid AMRM token

2015-02-26 Thread Rahul Chhiber
Hi Xuan,

I applied the patch for YARN-3103 and the issue hasn't occurred since. Thanks 
for your help! :)

Regards,
Rahul Chhiber

From: Xuan Gong [mailto:xg...@hortonworks.com]
Sent: Friday, February 06, 2015 5:53 AM
To: user@hadoop.apache.org
Subject: Re: Application Master fails due to Invalid AMRM token

Hey, Rahul

Can be related to:
https://issues.apache.org/jira/browse/MAPREDUCE-6230 and
https://issues.apache.org/jira/browse/YARN-3103

You can manually apply these two patches, and try again.


Thanks

Xuan Gong



From: bharath vissapragada 
bharathvissapragada1...@gmail.commailto:bharathvissapragada1...@gmail.com
Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org 
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Date: Wednesday, February 4, 2015 at 11:40 PM
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org 
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: Re: Application Master fails due to Invalid AMRM token

Might be this https://issues.apache.org/jira/browse/YARN-2964. This bug was 
injected due https://issues.apache.org/jira/browse/YARN-2704.


On Thu, Feb 5, 2015 at 1:04 PM, Rahul Chhiber 
rahul.chhi...@cumulus-systems.commailto:rahul.chhi...@cumulus-systems.com 
wrote:
Hi all,

I am running a Hadoop cluster of 4 nodes (Hadoop 2.6). I am facing an irregular 
error, relating to an Invalid AMRM token that causes my YARN application to 
crash anytime from 1 day up to a week after starting. Application launches 
successfully and runs for a period of time which is not predictable, then 
crashes with the following logs (Appmaster.stdout). When I restart,  
application works perfectly for some time before crashing again with the same 
error.

2015-02-04 12:55:51,394 [AMRM Heartbeater thread] 
org.apache.hadoop.ipc.Client-DEBUG-The ping interval is 6 ms.
2015-02-04 12:55:51,394 [AMRM Heartbeater thread] 
org.apache.hadoop.ipc.Client-DEBUG-Connecting to 
masternode/192.168.143.23:8030http://192.168.143.23:8030
2015-02-04 12:55:51,396 [AMRM Heartbeater thread] 
org.apache.hadoop.security.UserGroupInformation-DEBUG-PrivilegedAction as:user1 
(auth:SIMPLE) from:org.ap
ache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:717)
2015-02-04 12:55:51,396 [AMRM Heartbeater thread] 
org.apache.hadoop.security.SaslRpcClient-DEBUG-Sending sasl message state: 
NEGOTIATE

2015-02-04 12:55:51,397 [AMRM Heartbeater thread] 
org.apache.hadoop.security.SaslRpcClient-DEBUG-Received SASL message state: 
NEGOTIATE
auths {
  method: TOKEN
  mechanism: DIGEST-MD5
  protocol: 
  serverId: default
  challenge: 
realm=\default\,nonce=\FjsVVqgBotmE1OIpCE6f/KVmiuM3ixIolXg/l5et\,qop=\auth\,charset=utf-8,algorithm=md5-sess
}

2015-02-04 12:55:51,397 [AMRM Heartbeater thread] 
org.apache.hadoop.security.SaslRpcClient-DEBUG-Get token info proto:interface 
org.apache.hadoop.yarn.api.
ApplicationMasterProtocolPB 
info:org.apache.hadoop.yarn.security.SchedulerSecurityInfo$1@4ae70093
2015-02-04 12:55:51,397 [AMRM Heartbeater thread] 
org.apache.hadoop.yarn.security.AMRMTokenSelector-DEBUG-Looking for a token 
with service 192.168.143.23:8http://192.168.143.23:8
030
2015-02-04 12:55:51,397 [AMRM Heartbeater thread] 
org.apache.hadoop.yarn.security.AMRMTokenSelector-DEBUG-Token kind is 
YARN_AM_RM_TOKEN and the token's service name is 
192.168.143.23:8030http://192.168.143.23:8030
2015-02-04 12:55:51,397 [AMRM Heartbeater thread] 
org.apache.hadoop.security.SaslRpcClient-DEBUG-Creating SASL DIGEST-MD5(TOKEN)  
client to authenticate to service at default
2015-02-04 12:55:51,398 [AMRM Heartbeater thread] 
org.apache.hadoop.security.SaslRpcClient-DEBUG-Use TOKEN authentication for 
protocol ApplicationMasterProtocolPB
2015-02-04 12:55:51,398 [AMRM Heartbeater thread] 
org.apache.hadoop.security.SaslRpcClient-DEBUG-SASL client callback: setting 
username: Cg0KCQgBEM3bh860KRACEMyF5q/9/wE=
2015-02-04 12:55:51,398 [AMRM Heartbeater thread] 
org.apache.hadoop.security.SaslRpcClient-DEBUG-SASL client callback: setting 
userPassword
2015-02-04 12:55:51,398 [AMRM Heartbeater thread] 
org.apache.hadoop.security.SaslRpcClient-DEBUG-SASL client callback: setting 
realm: default
2015-02-04 12:55:51,398 [AMRM Heartbeater thread] 
org.apache.hadoop.security.SaslRpcClient-DEBUG-Sending sasl message state: 
INITIATE
token: 
charset=utf-8,username=\Cg0KCQgBEM3bh860KRACEMyF5q/9/wE=\,realm=\default\,nonce=\FjsVVqgBotmE1OIpCE6f/KVmiuM3ixIolXg/l5et\,nc=0001,cnonce=\dETUXCWNF7Aw2ZReFDsKF5jj9bKcyvoQYEJDU9N5\,digest-uri=\/default\,maxbuf=65536,response=1bdc86e7222c86e0692d30db6bec1479,qop=auth
auths {
  method: TOKEN
  mechanism: DIGEST-MD5
  protocol: 
  serverId: default
}

2015-02-04 12:55:51,400 [AMRM Heartbeater thread] 
org.apache.hadoop.security.UserGroupInformation-DEBUG-PrivilegedActionException 
as:user1 (auth:SIMPLE) 
cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 Invalid AMRMToken from appattempt_1422871621069_0001_02
2015-02-04 12:55

Application Master fails due to Invalid AMRM token

2015-02-04 Thread Rahul Chhiber
)
at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:274)
2015-02-04 12:55:51,423 [AMRM Callback Handler Thread] 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl-ERROR-Stopping 
callback due to:
org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid AMRMToken 
from appattempt_1422871621069_0001_02
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy12.allocate(Unknown Source)
at 
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:333)
at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:224)
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 Invalid AMRMToken from appattempt_1422871621069_0001_02
at org.apache.hadoop.ipc.Client.call(Client.java:1468)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy11.allocate(Unknown Source)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
... 8 more
2015-02-04 12:55:51,424 [AMRM Callback Handler Thread] 
org.apache.hadoop.service.AbstractService-DEBUG-Service: 
org.apache.hadoop.yarn.client.api.async.AMRMClientAsync entered state STOPPED
2015-02-04 12:55:51,424 [AMRM Callback Handler Thread] 
org.apache.hadoop.service.AbstractService-DEBUG-Service: 
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl entered state STOPPED
2015-02-04 12:55:51,424 [AMRM Callback Handler Thread] 
org.apache.hadoop.ipc.Client-DEBUG-stopping client from cache: 
org.apache.hadoop.ipc.Client@552d7308
2015-02-04 12:56:03,544 
[org.eclipse.jetty.server.session.HashSessionManager@425af55fTimer] 
org.eclipse.jetty.server.session-DEBUG-Scavenging sessions at 1423054563544

Any help is greatly appreciated.

Thanks,
Rahul Chhiber



RE: How to handle Container crash in YARN

2014-12-17 Thread Rahul Chhiber
Sajid,

Check the logs for your container at 
$HADOOP_INSTALL_DIR/logs/userlogs/application_id/container_id. Note that 
these will be present on the node where your Application Master is running.

If the container was not able to start, you might get something by printing the 
stack trace in onStartContainerError(ContainerId containerId, Throwable t) 
callback method in the NMClientAsync.CallbackHandler interface.

You should always be capturing the exit status of the container inside 
onContainersCompleted(ListContainerStatus completedContainers) callback 
method of AMRMClientAsync.CallbackHandler interface.

Please see the source of Distributed Shell application on github for an example 
of how this is done - 
https://github.com/apache/hadoop-common/tree/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell

Regards,
Rahul Chhiber

From: Sajid Syed [mailto:sajid...@gmail.com]
Sent: Thursday, December 18, 2014 9:44 AM
To: user@hadoop.apache.org
Subject: How to handle Container crash in YARN

Hello,

Can any please explain me how to handle/Resolve the Container crash in YARN 
Hadoop.

Thanks
Sajid Syed