[ 
https://issues.apache.org/jira/browse/FLINK-5410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sachin Goel closed FLINK-5410.
------------------------------
    Resolution: Later

Darn it. Completely slipped my mind. Asking this on mailing list first before 
filing a bug. 

> Running out of memory on Yarn
> -----------------------------
>
>                 Key: FLINK-5410
>                 URL: https://issues.apache.org/jira/browse/FLINK-5410
>             Project: Flink
>          Issue Type: Bug
>          Components: YARN
>            Reporter: Sachin Goel
>
> I'm running locally under this configuration(copied from nodemanager logs):
> physical-memory=8192 virtual-memory=17204 virtual-cores=8
> Before starting a flink deployment, memory usage stats show 3.7 GB used on 
> system, indicating lots of free memory for flink containers.
> However, after I submit using minimal resource requirements, 
> ./yarn-session.sh -n 1 -tm 768, the cluster deploys successfully but then 
> every application on system receives a sigterm and it basically kills the 
> current user session, logging out of the system.
> The job manager and task manager logs contain just the information that a 
> SIGTERM was received and shut down gracefully.
> All yarn and dfs process contain the log information showing the receipt of a 
> sigterm. 
> Here's the relevant log from nodemanager: 
> {code}
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Starting resource-monitoring for container_1483603191971_0002_01_000002
> 2017-01-05 17:00:08,744 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Memory usage of ProcessTree 17872 for container-id 
> container_1483603191971_0002_01_000001: 282.7 MB of 1 GB physical memory 
> used; 2.1 GB of 2.1 GB virtual memory used
> 2017-01-05 17:00:08,744 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Process tree for container: container_1483603191971_0002_01_000001 has 
> processes older than 1 iteration running over the configured limit. 
> Limit=2254857728, current usage = 2255896576
> 2017-01-05 17:00:08,745 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Container [pid=17872,containerID=container_1483603191971_0002_01_000001] is 
> running beyond virtual memory limits. Current usage: 282.7 MB of 1 GB 
> physical memory used; 2.1 GB of 2.1 GB virtual memory used. Killing container.
> Dump of the process-tree for container_1483603191971_0002_01_000001 :
>       |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
>       |- 17872 17870 17872 17872 (bash) 0 0 21409792 812 /bin/bash -c 
> /usr/lib/jvm/java-8-openjdk-amd64//bin/java -Xmx424M  
> -Dlog.file=/opt/hadoop-2.7.3/logs/userlogs/application_1483603191971_0002/container_1483603191971_0002_01_000001/jobmanager.log
>  -Dlogback.configurationFile=file:logback.xml 
> -Dlog4j.configuration=file:log4j.properties 
> org.apache.flink.yarn.YarnApplicationMasterRunner  
> 1>/opt/hadoop-2.7.3/logs/userlogs/application_1483603191971_0002/container_1483603191971_0002_01_000001/jobmanager.out
>  
> 2>/opt/hadoop-2.7.3/logs/userlogs/application_1483603191971_0002/container_1483603191971_0002_01_000001/jobmanager.err
>  
>       |- 17879 17872 17872 17872 (java) 748 20 2234486784 71553 
> /usr/lib/jvm/java-8-openjdk-amd64//bin/java -Xmx424M 
> -Dlog.file=/opt/hadoop-2.7.3/logs/userlogs/application_1483603191971_0002/container_1483603191971_0002_01_000001/jobmanager.log
>  -Dlogback.configurationFile=file:logback.xml 
> -Dlog4j.configuration=file:log4j.properties 
> org.apache.flink.yarn.YarnApplicationMasterRunner 
> 2017-01-05 17:00:08,745 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Removed ProcessTree with root 17872
> 2017-01-05 17:00:08,746 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1483603191971_0002_01_000001 transitioned from RUNNING 
> to KILLING
> 2017-01-05 17:00:08,746 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Cleaning up container container_1483603191971_0002_01_000001
> 2017-01-05 17:00:08,779 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 15: 
> SIGTERM
> 2017-01-05 17:00:08,822 WARN 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code 
> from container container_1483603191971_0002_01_000001 is : 143
> {code}
> Is the memory available on my pc not enough or are there any known issues 
> which might lead to this? 
> Also, this doesn't occur every time I start a flink session. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to