[ https://issues.apache.org/jira/browse/SPARK-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guoqiang Li updated SPARK-1930: ------------------------------- Summary: The Container is running beyond physical memory limits, so as to be killed. (was: Container is running beyond physical memory limits) > The Container is running beyond physical memory limits, so as to be killed. > --------------------------------------------------------------------------- > > Key: SPARK-1930 > URL: https://issues.apache.org/jira/browse/SPARK-1930 > Project: Spark > Issue Type: Bug > Components: YARN > Reporter: Guoqiang Li > Fix For: 1.0.1 > > > When the containers occupies 8G memory ,the containers were killed > yarn node manager log: > {code} > 2014-05-23 13:35:30,776 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Container [pid=4947,containerID=container_1400809535638_0015_01_000005] is > running beyond physical memory limits. Current usage: 8.6 GB of 8.5 GB > physical memory used; 10.0 GB of 17.8 GB virtual memory used. Killing > container. > Dump of the process-tree for container_1400809535638_0015_01_000005 : > |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) > SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE > |- 4947 25417 4947 4947 (bash) 0 0 110804992 335 /bin/bash -c > /usr/java/jdk1.7.0_45-cloudera/bin/java -server -XX:OnOutOfMemoryError='kill > %p' -Xms8192m -Xmx8192m -Xss2m > -Djava.io.tmpdir=/yarn/nm/usercache/spark/appcache/application_1400809535638_0015/container_1400809535638_0015_01_000005/tmp > -Dlog4j.configuration=log4j-spark-container.properties > -Dspark.akka.askTimeout="120" -Dspark.akka.timeout="120" > -Dspark.akka.frameSize="20" > org.apache.spark.executor.CoarseGrainedExecutorBackend > akka.tcp://sp...@10dian71.domain.test:45477/user/CoarseGrainedScheduler 3 > 10dian72.domain.test 4 1> > /var/log/hadoop-yarn/container/application_1400809535638_0015/container_1400809535638_0015_01_000005/stdout > 2> > /var/log/hadoop-yarn/container/application_1400809535638_0015/container_1400809535638_0015_01_000005/stderr > > |- 4957 4947 4947 4947 (java) 157809 12620 10667016192 2245522 > /usr/java/jdk1.7.0_45-cloudera/bin/java -server -XX:OnOutOfMemoryError=kill > %p -Xms8192m -Xmx8192m -Xss2m > -Djava.io.tmpdir=/yarn/nm/usercache/spark/appcache/application_1400809535638_0015/container_1400809535638_0015_01_000005/tmp > -Dlog4j.configuration=log4j-spark-container.properties > -Dspark.akka.askTimeout=120 -Dspark.akka.timeout=120 > -Dspark.akka.frameSize=20 > org.apache.spark.executor.CoarseGrainedExecutorBackend > akka.tcp://sp...@10dian71.domain.test:45477/user/CoarseGrainedScheduler 3 > 10dian72.domain.test 4 > 2014-05-23 13:35:30,776 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Removed ProcessTree with root 4947 > 2014-05-23 13:35:30,776 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: > Container container_1400809535638_0015_01_000005 transitioned from RUNNING > to KILLING > 2014-05-23 13:35:30,777 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: > Cleaning up container container_1400809535638_0015_01_000005 > 2014-05-23 13:35:30,788 WARN > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code > from container container_1400809535638_0015_01_000005 is : 143 > 2014-05-23 13:35:30,829 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: > Container container_1400809535638_0015_01_000005 transitioned from KILLING > to CONTAINER_CLEANEDUP_AFTER_KILL > 2014-05-23 13:35:30,830 INFO > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting > absolute path : > /yarn/nm/usercache/spark/appcache/application_1400809535638_0015/container_1400809535638_0015_01_000005 > 2014-05-23 13:35:30,830 INFO > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=spark > OPERATION=Container Finished - Killed TARGET=ContainerImpl > RESULT=SUCCESS APPID=application_1400809535638_0015 > CONTAINERID=container_1400809535638_0015_01_000005 > 2014-05-23 13:35:30,830 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: > Container container_1400809535638_0015_01_000005 transitioned from > CONTAINER_CLEANEDUP_AFTER_KILL to DONE > 2014-05-23 13:35:30,830 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Removing container_1400809535638_0015_01_000005 from application > application_1400809535638_0015 > {code} > I think it should be related with {{YarnAllocationHandler.MEMORY_OVERHEA}} > https://github.com/apache/spark/blob/master/yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala#L562 > Relative to 8G, 384 MB is too small -- This message was sent by Atlassian JIRA (v6.2#6252)