[ https://issues.apache.org/jira/browse/FLINK-5410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sachin Goel closed FLINK-5410. ------------------------------ Resolution: Later Darn it. Completely slipped my mind. Asking this on mailing list first before filing a bug. > Running out of memory on Yarn > ----------------------------- > > Key: FLINK-5410 > URL: https://issues.apache.org/jira/browse/FLINK-5410 > Project: Flink > Issue Type: Bug > Components: YARN > Reporter: Sachin Goel > > I'm running locally under this configuration(copied from nodemanager logs): > physical-memory=8192 virtual-memory=17204 virtual-cores=8 > Before starting a flink deployment, memory usage stats show 3.7 GB used on > system, indicating lots of free memory for flink containers. > However, after I submit using minimal resource requirements, > ./yarn-session.sh -n 1 -tm 768, the cluster deploys successfully but then > every application on system receives a sigterm and it basically kills the > current user session, logging out of the system. > The job manager and task manager logs contain just the information that a > SIGTERM was received and shut down gracefully. > All yarn and dfs process contain the log information showing the receipt of a > sigterm. > Here's the relevant log from nodemanager: > {code} > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Starting resource-monitoring for container_1483603191971_0002_01_000002 > 2017-01-05 17:00:08,744 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Memory usage of ProcessTree 17872 for container-id > container_1483603191971_0002_01_000001: 282.7 MB of 1 GB physical memory > used; 2.1 GB of 2.1 GB virtual memory used > 2017-01-05 17:00:08,744 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Process tree for container: container_1483603191971_0002_01_000001 has > processes older than 1 iteration running over the configured limit. > Limit=2254857728, current usage = 2255896576 > 2017-01-05 17:00:08,745 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Container [pid=17872,containerID=container_1483603191971_0002_01_000001] is > running beyond virtual memory limits. Current usage: 282.7 MB of 1 GB > physical memory used; 2.1 GB of 2.1 GB virtual memory used. Killing container. > Dump of the process-tree for container_1483603191971_0002_01_000001 : > |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) > SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE > |- 17872 17870 17872 17872 (bash) 0 0 21409792 812 /bin/bash -c > /usr/lib/jvm/java-8-openjdk-amd64//bin/java -Xmx424M > -Dlog.file=/opt/hadoop-2.7.3/logs/userlogs/application_1483603191971_0002/container_1483603191971_0002_01_000001/jobmanager.log > -Dlogback.configurationFile=file:logback.xml > -Dlog4j.configuration=file:log4j.properties > org.apache.flink.yarn.YarnApplicationMasterRunner > 1>/opt/hadoop-2.7.3/logs/userlogs/application_1483603191971_0002/container_1483603191971_0002_01_000001/jobmanager.out > > 2>/opt/hadoop-2.7.3/logs/userlogs/application_1483603191971_0002/container_1483603191971_0002_01_000001/jobmanager.err > > |- 17879 17872 17872 17872 (java) 748 20 2234486784 71553 > /usr/lib/jvm/java-8-openjdk-amd64//bin/java -Xmx424M > -Dlog.file=/opt/hadoop-2.7.3/logs/userlogs/application_1483603191971_0002/container_1483603191971_0002_01_000001/jobmanager.log > -Dlogback.configurationFile=file:logback.xml > -Dlog4j.configuration=file:log4j.properties > org.apache.flink.yarn.YarnApplicationMasterRunner > 2017-01-05 17:00:08,745 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Removed ProcessTree with root 17872 > 2017-01-05 17:00:08,746 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1483603191971_0002_01_000001 transitioned from RUNNING > to KILLING > 2017-01-05 17:00:08,746 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: > Cleaning up container container_1483603191971_0002_01_000001 > 2017-01-05 17:00:08,779 ERROR > org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 15: > SIGTERM > 2017-01-05 17:00:08,822 WARN > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code > from container container_1483603191971_0002_01_000001 is : 143 > {code} > Is the memory available on my pc not enough or are there any known issues > which might lead to this? > Also, this doesn't occur every time I start a flink session. -- This message was sent by Atlassian JIRA (v6.3.4#6332)