[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15875702#comment-15875702 ]
Deenbandhu Agarwal edited comment on SPARK-19644 at 2/21/17 9:54 AM: --------------------------------------------------------------------- I have analysed the issue more. I performed some of the experiments as follows and analysed the heapdump using jvisualvm at some intervals. 1. Dstream.foreachRdd(rdd => rdd.map(r => someCaseClass(r)).take(10).foreach(println)) 2. Dstream.foreachRdd(rdd => rdd.map(r => someCaseClass(r)).toDF.show(10,false)) 3. Dstream.foreachRdd(rdd => rdd.map(r => someCaseClass(r)).toDS.show(10,false)) I Observed that the number of instances of scala.collection.immutable.$colon$colon remain constant in 1 scenario but it keeps on increasing in 2 and 3 scenario. So I think there is something leaky in toDS or toDF function this may help you out to find out the issue. was (Author: deenbandhu): I have analysed the issue more. I performed some of the experiments as follows and analysed the heapdump using jvisualvm at some intervals. 1. Dstream.foreachRdd(rdd => rdd.map(x => someCaseClass(x)).take(10).foreach(println)) 2. Dstream.foreachRdd(rdd => rdd.map(x => someCaseClass(x)).toDF.show(10,false)) 3. Dstream.foreachRdd(rdd => rdd.map(x => someCaseClass(x)).toDS.show(10,false)) I Observed that the number of instances of scala.collection.immutable.$colon$colon remain constant in 1 scenario but it keeps on increasing in 2 and 3 scenario. So I think there is something leaky in toDS or toDF function this may help you out to find out the issue. > Memory leak in Spark Streaming > ------------------------------ > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams > Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB > Reporter: Deenbandhu Agarwal > Priority: Critical > Labels: memory_leak, performance > Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org