Re: Ever increasing physical memory for a Spark Application in YARN

2016-05-03 Thread Nitin Goyal
Hi Daniel,

I could indeed discover the problem in my case and it turned out to be a
bug at parquet side and I had raised and contributed to the following issue
:-

https://issues.apache.org/jira/browse/PARQUET-353

Hope this helps!

Thanks
-Nitin


On Mon, May 2, 2016 at 9:15 PM, Daniel Darabos <
daniel.dara...@lynxanalytics.com> wrote:

> Hi Nitin,
> Sorry for waking up this ancient thread. That's a fantastic set of JVM
> flags! We just hit the same problem, but we haven't even discovered all
> those flags for limiting memory growth. I wanted to ask if you ever
> discovered anything further?
>
> I see you also set -XX:NewRatio=3. This is a very important flag since
> Spark 1.6.0. With unified memory management with the default
> spark.memory.fraction=0.75 the cache will fill up 75% of the heap. The
> default NewRatio is 2, so the cache will not fit in the old generation
> pool, constantly triggering full GCs. With NewRatio=3 the old generation
> pool is 75% of the heap, so it (just) fits the cache. We find this makes a
> very significant performance difference in practice.
>
> Perhaps this should be documented somewhere. Or the default
> spark.memory.fraction should be 0.66, so that it works out with the default
> JVM flags.
>
> On Mon, Jul 27, 2015 at 6:08 PM, Nitin Goyal <nitin2go...@gmail.com>
> wrote:
>
>> I am running a spark application in YARN having 2 executors with Xms/Xmx
>> as
>> 32 Gigs and spark.yarn.excutor.memoryOverhead as 6 gigs.
>>
>> I am seeing that the app's physical memory is ever increasing and finally
>> gets killed by node manager
>>
>> 2015-07-25 15:07:05,354 WARN
>>
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>> Container [pid=10508,containerID=container_1437828324746_0002_01_03]
>> is
>> running beyond physical memory limits. Current usage: 38.0 GB of 38 GB
>> physical memory used; 39.5 GB of 152 GB virtual memory used. Killing
>> container.
>> Dump of the process-tree for container_1437828324746_0002_01_03 :
>> |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
>> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
>> |- 10508 9563 10508 10508 (bash) 0 0 9433088 314 /bin/bash -c
>> /usr/java/default/bin/java -server -XX:OnOutOfMemoryError='kill %p'
>> -Xms32768m -Xmx32768m  -Dlog4j.configuration=log4j-executor.properties
>> -XX:MetaspaceSize=512m -XX:+UseG1GC -XX:+PrintGCTimeStamps
>> -XX:+PrintGCDateStamps -XX:+PrintGCDetails -Xloggc:gc.log
>> -XX:AdaptiveSizePolicyOutputInterval=1  -XX:+UseGCLogFileRotation
>> -XX:GCLogFileSize=500M -XX:NumberOfGCLogFiles=1
>> -XX:MaxDirectMemorySize=3500M -XX:NewRatio=3
>> -Dcom.sun.management.jmxremote
>> -Dcom.sun.management.jmxremote.port=36082
>> -Dcom.sun.management.jmxremote.authenticate=false
>> -Dcom.sun.management.jmxremote.ssl=false -XX:NativeMemoryTracking=detail
>> -XX:ReservedCodeCacheSize=100M -XX:MaxMetaspaceSize=512m
>> -XX:CompressedClassSpaceSize=256m
>>
>> -Djava.io.tmpdir=/data/yarn/datanode/nm-local-dir/usercache/admin/appcache/application_1437828324746_0002/container_1437828324746_0002_01_03/tmp
>> '-Dspark.driver.port=43354'
>>
>> -Dspark.yarn.app.container.log.dir=/opt/hadoop/logs/userlogs/application_1437828324746_0002/container_1437828324746_0002_01_03
>> org.apache.spark.executor.CoarseGrainedExecutorBackend
>> akka.tcp://sparkDriver@nn1:43354/user/CoarseGrainedScheduler 1 dn3 6
>> application_1437828324746_0002 1>
>>
>> /opt/hadoop/logs/userlogs/application_1437828324746_0002/container_1437828324746_0002_01_03/stdout
>> 2>
>>
>> /opt/hadoop/logs/userlogs/application_1437828324746_0002/container_1437828324746_0002_01_03/stderr
>>
>>
>> I diabled YARN's parameter "yarn.nodemanager.pmem-check-enabled" and
>> noticed
>> that physical memory usage went till 40 gigs
>>
>> I checked the total RSS in /proc/pid/smaps and it was same value as
>> physical
>> memory reported by Yarn and seen in top command.
>>
>> I checked that its not a problem with the heap but something is increasing
>> in off heap/ native memory. I used tools like Visual VM but didn't find
>> anything that's increasing there. MaxDirectMmeory also didn't exceed
>> 600MB.
>> Peak number of active threads was 70-80 and thread stack size didn't
>> exceed
>> 100MB. MetaspaceSize was around 60-70MB.
>>
>> FYI, I am on Spark 1.2 and Hadoop 2.4.0 and my spark application is based
>> on
>> Spark SQL and it's an HDFS read/write intensive application and caches
>> data
&

Re: Ever increasing physical memory for a Spark Application in YARN

2016-05-02 Thread Daniel Darabos
Hi Nitin,
Sorry for waking up this ancient thread. That's a fantastic set of JVM
flags! We just hit the same problem, but we haven't even discovered all
those flags for limiting memory growth. I wanted to ask if you ever
discovered anything further?

I see you also set -XX:NewRatio=3. This is a very important flag since
Spark 1.6.0. With unified memory management with the default
spark.memory.fraction=0.75 the cache will fill up 75% of the heap. The
default NewRatio is 2, so the cache will not fit in the old generation
pool, constantly triggering full GCs. With NewRatio=3 the old generation
pool is 75% of the heap, so it (just) fits the cache. We find this makes a
very significant performance difference in practice.

Perhaps this should be documented somewhere. Or the default
spark.memory.fraction should be 0.66, so that it works out with the default
JVM flags.

On Mon, Jul 27, 2015 at 6:08 PM, Nitin Goyal <nitin2go...@gmail.com> wrote:

> I am running a spark application in YARN having 2 executors with Xms/Xmx as
> 32 Gigs and spark.yarn.excutor.memoryOverhead as 6 gigs.
>
> I am seeing that the app's physical memory is ever increasing and finally
> gets killed by node manager
>
> 2015-07-25 15:07:05,354 WARN
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Container [pid=10508,containerID=container_1437828324746_0002_01_03] is
> running beyond physical memory limits. Current usage: 38.0 GB of 38 GB
> physical memory used; 39.5 GB of 152 GB virtual memory used. Killing
> container.
> Dump of the process-tree for container_1437828324746_0002_01_03 :
> |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
> |- 10508 9563 10508 10508 (bash) 0 0 9433088 314 /bin/bash -c
> /usr/java/default/bin/java -server -XX:OnOutOfMemoryError='kill %p'
> -Xms32768m -Xmx32768m  -Dlog4j.configuration=log4j-executor.properties
> -XX:MetaspaceSize=512m -XX:+UseG1GC -XX:+PrintGCTimeStamps
> -XX:+PrintGCDateStamps -XX:+PrintGCDetails -Xloggc:gc.log
> -XX:AdaptiveSizePolicyOutputInterval=1  -XX:+UseGCLogFileRotation
> -XX:GCLogFileSize=500M -XX:NumberOfGCLogFiles=1
> -XX:MaxDirectMemorySize=3500M -XX:NewRatio=3 -Dcom.sun.management.jmxremote
> -Dcom.sun.management.jmxremote.port=36082
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dcom.sun.management.jmxremote.ssl=false -XX:NativeMemoryTracking=detail
> -XX:ReservedCodeCacheSize=100M -XX:MaxMetaspaceSize=512m
> -XX:CompressedClassSpaceSize=256m
>
> -Djava.io.tmpdir=/data/yarn/datanode/nm-local-dir/usercache/admin/appcache/application_1437828324746_0002/container_1437828324746_0002_01_03/tmp
> '-Dspark.driver.port=43354'
>
> -Dspark.yarn.app.container.log.dir=/opt/hadoop/logs/userlogs/application_1437828324746_0002/container_1437828324746_0002_01_03
> org.apache.spark.executor.CoarseGrainedExecutorBackend
> akka.tcp://sparkDriver@nn1:43354/user/CoarseGrainedScheduler 1 dn3 6
> application_1437828324746_0002 1>
>
> /opt/hadoop/logs/userlogs/application_1437828324746_0002/container_1437828324746_0002_01_03/stdout
> 2>
>
> /opt/hadoop/logs/userlogs/application_1437828324746_0002/container_1437828324746_0002_01_03/stderr
>
>
> I diabled YARN's parameter "yarn.nodemanager.pmem-check-enabled" and
> noticed
> that physical memory usage went till 40 gigs
>
> I checked the total RSS in /proc/pid/smaps and it was same value as
> physical
> memory reported by Yarn and seen in top command.
>
> I checked that its not a problem with the heap but something is increasing
> in off heap/ native memory. I used tools like Visual VM but didn't find
> anything that's increasing there. MaxDirectMmeory also didn't exceed 600MB.
> Peak number of active threads was 70-80 and thread stack size didn't exceed
> 100MB. MetaspaceSize was around 60-70MB.
>
> FYI, I am on Spark 1.2 and Hadoop 2.4.0 and my spark application is based
> on
> Spark SQL and it's an HDFS read/write intensive application and caches data
> in Spark SQL's in-memory caching
>
> Any help would be highly appreciated. Or any hint that where should I look
> to debug memory leak or if any tool already there. Let me know if any other
> information is needed.
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Ever-increasing-physical-memory-for-a-Spark-Application-in-YARN-tp13446.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Ever increasing physical memory for a Spark Application in YARN

2015-07-27 Thread Nitin Goyal
I am running a spark application in YARN having 2 executors with Xms/Xmx as
32 Gigs and spark.yarn.excutor.memoryOverhead as 6 gigs.

I am seeing that the app's physical memory is ever increasing and finally
gets killed by node manager

2015-07-25 15:07:05,354 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Container [pid=10508,containerID=container_1437828324746_0002_01_03] is
running beyond physical memory limits. Current usage: 38.0 GB of 38 GB
physical memory used; 39.5 GB of 152 GB virtual memory used. Killing
container.
Dump of the process-tree for container_1437828324746_0002_01_03 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 10508 9563 10508 10508 (bash) 0 0 9433088 314 /bin/bash -c
/usr/java/default/bin/java -server -XX:OnOutOfMemoryError='kill %p'
-Xms32768m -Xmx32768m  -Dlog4j.configuration=log4j-executor.properties
-XX:MetaspaceSize=512m -XX:+UseG1GC -XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps -XX:+PrintGCDetails -Xloggc:gc.log
-XX:AdaptiveSizePolicyOutputInterval=1  -XX:+UseGCLogFileRotation
-XX:GCLogFileSize=500M -XX:NumberOfGCLogFiles=1
-XX:MaxDirectMemorySize=3500M -XX:NewRatio=3 -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=36082
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false -XX:NativeMemoryTracking=detail
-XX:ReservedCodeCacheSize=100M -XX:MaxMetaspaceSize=512m
-XX:CompressedClassSpaceSize=256m
-Djava.io.tmpdir=/data/yarn/datanode/nm-local-dir/usercache/admin/appcache/application_1437828324746_0002/container_1437828324746_0002_01_03/tmp
'-Dspark.driver.port=43354'
-Dspark.yarn.app.container.log.dir=/opt/hadoop/logs/userlogs/application_1437828324746_0002/container_1437828324746_0002_01_03
org.apache.spark.executor.CoarseGrainedExecutorBackend
akka.tcp://sparkDriver@nn1:43354/user/CoarseGrainedScheduler 1 dn3 6
application_1437828324746_0002 1
/opt/hadoop/logs/userlogs/application_1437828324746_0002/container_1437828324746_0002_01_03/stdout
2
/opt/hadoop/logs/userlogs/application_1437828324746_0002/container_1437828324746_0002_01_03/stderr


I diabled YARN's parameter yarn.nodemanager.pmem-check-enabled and noticed
that physical memory usage went till 40 gigs

I checked the total RSS in /proc/pid/smaps and it was same value as physical
memory reported by Yarn and seen in top command.

I checked that its not a problem with the heap but something is increasing
in off heap/ native memory. I used tools like Visual VM but didn't find
anything that's increasing there. MaxDirectMmeory also didn't exceed 600MB.
Peak number of active threads was 70-80 and thread stack size didn't exceed
100MB. MetaspaceSize was around 60-70MB.

FYI, I am on Spark 1.2 and Hadoop 2.4.0 and my spark application is based on
Spark SQL and it's an HDFS read/write intensive application and caches data
in Spark SQL's in-memory caching

Any help would be highly appreciated. Or any hint that where should I look
to debug memory leak or if any tool already there. Let me know if any other
information is needed.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Ever-increasing-physical-memory-for-a-Spark-Application-in-YARN-tp13446.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org