Hi Nitin, Sorry for waking up this ancient thread. That's a fantastic set of JVM flags! We just hit the same problem, but we haven't even discovered all those flags for limiting memory growth. I wanted to ask if you ever discovered anything further?
I see you also set -XX:NewRatio=3. This is a very important flag since Spark 1.6.0. With unified memory management with the default spark.memory.fraction=0.75 the cache will fill up 75% of the heap. The default NewRatio is 2, so the cache will not fit in the old generation pool, constantly triggering full GCs. With NewRatio=3 the old generation pool is 75% of the heap, so it (just) fits the cache. We find this makes a very significant performance difference in practice. Perhaps this should be documented somewhere. Or the default spark.memory.fraction should be 0.66, so that it works out with the default JVM flags. On Mon, Jul 27, 2015 at 6:08 PM, Nitin Goyal <nitin2go...@gmail.com> wrote: > I am running a spark application in YARN having 2 executors with Xms/Xmx as > 32 Gigs and spark.yarn.excutor.memoryOverhead as 6 gigs. > > I am seeing that the app's physical memory is ever increasing and finally > gets killed by node manager > > 2015-07-25 15:07:05,354 WARN > > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Container [pid=10508,containerID=container_1437828324746_0002_01_000003] is > running beyond physical memory limits. Current usage: 38.0 GB of 38 GB > physical memory used; 39.5 GB of 152 GB virtual memory used. Killing > container. > Dump of the process-tree for container_1437828324746_0002_01_000003 : > |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) > SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE > |- 10508 9563 10508 10508 (bash) 0 0 9433088 314 /bin/bash -c > /usr/java/default/bin/java -server -XX:OnOutOfMemoryError='kill %p' > -Xms32768m -Xmx32768m -Dlog4j.configuration=log4j-executor.properties > -XX:MetaspaceSize=512m -XX:+UseG1GC -XX:+PrintGCTimeStamps > -XX:+PrintGCDateStamps -XX:+PrintGCDetails -Xloggc:gc.log > -XX:AdaptiveSizePolicyOutputInterval=1 -XX:+UseGCLogFileRotation > -XX:GCLogFileSize=500M -XX:NumberOfGCLogFiles=1 > -XX:MaxDirectMemorySize=3500M -XX:NewRatio=3 -Dcom.sun.management.jmxremote > -Dcom.sun.management.jmxremote.port=36082 > -Dcom.sun.management.jmxremote.authenticate=false > -Dcom.sun.management.jmxremote.ssl=false -XX:NativeMemoryTracking=detail > -XX:ReservedCodeCacheSize=100M -XX:MaxMetaspaceSize=512m > -XX:CompressedClassSpaceSize=256m > > -Djava.io.tmpdir=/data/yarn/datanode/nm-local-dir/usercache/admin/appcache/application_1437828324746_0002/container_1437828324746_0002_01_000003/tmp > '-Dspark.driver.port=43354' > > -Dspark.yarn.app.container.log.dir=/opt/hadoop/logs/userlogs/application_1437828324746_0002/container_1437828324746_0002_01_000003 > org.apache.spark.executor.CoarseGrainedExecutorBackend > akka.tcp://sparkDriver@nn1:43354/user/CoarseGrainedScheduler 1 dn3 6 > application_1437828324746_0002 1> > > /opt/hadoop/logs/userlogs/application_1437828324746_0002/container_1437828324746_0002_01_000003/stdout > 2> > > /opt/hadoop/logs/userlogs/application_1437828324746_0002/container_1437828324746_0002_01_000003/stderr > > > I diabled YARN's parameter "yarn.nodemanager.pmem-check-enabled" and > noticed > that physical memory usage went till 40 gigs > > I checked the total RSS in /proc/pid/smaps and it was same value as > physical > memory reported by Yarn and seen in top command. > > I checked that its not a problem with the heap but something is increasing > in off heap/ native memory. I used tools like Visual VM but didn't find > anything that's increasing there. MaxDirectMmeory also didn't exceed 600MB. > Peak number of active threads was 70-80 and thread stack size didn't exceed > 100MB. MetaspaceSize was around 60-70MB. > > FYI, I am on Spark 1.2 and Hadoop 2.4.0 and my spark application is based > on > Spark SQL and it's an HDFS read/write intensive application and caches data > in Spark SQL's in-memory caching > > Any help would be highly appreciated. Or any hint that where should I look > to debug memory leak or if any tool already there. Let me know if any other > information is needed. > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/Ever-increasing-physical-memory-for-a-Spark-Application-in-YARN-tp13446.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >