[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming

2017-06-18 Thread Deenbandhu Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053502#comment-16053502
 ] 

Deenbandhu Agarwal commented on SPARK-19644:


And the spark cassandra connector is also not out for those spark version. 
which is a dependency for us

> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>  Labels: memory_leak, performance
> Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming

2017-06-18 Thread Deenbandhu Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053498#comment-16053498
 ] 

Deenbandhu Agarwal commented on SPARK-19644:


yes i can try but is there any report of such events in that particular version 


> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>  Labels: memory_leak, performance
> Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17381) Memory leak org.apache.spark.sql.execution.ui.SQLTaskMetrics

2017-06-18 Thread Deenbandhu Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053488#comment-16053488
 ] 

Deenbandhu Agarwal edited comment on SPARK-17381 at 6/19/17 5:33 AM:
-

[~joaomaiaduarte] I am facing a similar kind of issue. I am running spark 
streaming in the production environment with 6 executors and 1 GB memory and 1 
core each and driver with 3 GB.Spark Version used is 2.0.1. Objects of some 
linked list are getting accumulated over the time in the JVM Heap of driver and 
after 2-3 hours the GC become very frequent and jobs starts queuing up. I tried 
your solution but in vain. We are not using linked list anywhere. You can find 
details of the issue here [https://issues.apache.org/jira/browse/SPARK-19644]  


was (Author: deenbandhu):
[~joaomaiaduarte] I am facing a similar kind of issue. I am running spark 
streaming in the production environment with 6 executors and 1 GB memory and 1 
core each and driver with 3 GB.Spark Version used is 2.0.1. Objects of some 
linked list are getting accumulated over the time in the JVM Heap of driver and 
after 2-3 hours the GC become very frequent and jobs starts queuing up. I tried 
your solution but in vain. We are not using linked list anywhere.  

> Memory leak  org.apache.spark.sql.execution.ui.SQLTaskMetrics
> -
>
> Key: SPARK-17381
> URL: https://issues.apache.org/jira/browse/SPARK-17381
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: EMR 5.0.0 (submitted as yarn-client)
> Java Version  1.8.0_101 (Oracle Corporation)
> Scala Version version 2.11.8
> Problem also happens when I run locally with similar versions of java/scala. 
> OS: Ubuntu 16.04
>Reporter: Joao Duarte
>
> I am running a Spark Streaming application from a Kinesis stream. After some 
> hours running it gets out of memory. After a driver heap dump I found two 
> problems:
> 1) huge amount of org.apache.spark.sql.execution.ui.SQLTaskMetrics (It seems 
> this was a problem before: 
> https://issues.apache.org/jira/browse/SPARK-11192);
> To replicate the org.apache.spark.sql.execution.ui.SQLTaskMetrics leak just 
> needed to run the code below:
> {code}
> val dstream = ssc.union(kinesisStreams)
> dstream.foreachRDD((streamInfo: RDD[Array[Byte]]) => {
>   val toyDF = streamInfo.map(_ =>
> (1, "data","more data "
> ))
> .toDF("Num", "Data", "MoreData" )
>   toyDF.agg(sum("Num")).first().get(0)
> }
> )
> {code}
> 2) huge amount of Array[Byte] (9Gb+)
> After some analysis, I noticed that most of the Array[Byte] where being 
> referenced by objects that were being referenced by SQLTaskMetrics. The 
> strangest thing is that those Array[Byte] were basically text that were 
> loaded in the executors, so they should never be in the driver at all!
> Still could not replicate the 2nd problem with a simple code (the original 
> was complex with data coming from S3, DynamoDB and other databases). However, 
> when I debug the application I can see that in Executor.scala, during 
> reportHeartBeat(),  the data that should not be sent to the driver is being 
> added to "accumUpdates" which, as I understand, will be sent to the driver 
> for reporting.
> To be more precise, one of the taskRunner in the loop "for (taskRunner <- 
> runningTasks.values().asScala)"  contains a GenericInternalRow with a lot of 
> data that should not go to the driver. The path would be in my case: 
> taskRunner.task.metrics.externalAccums[2]._list[0]. This data is similar (if 
> not the same) to the data I see when I do a driver heap dump. 
> I guess that if the org.apache.spark.sql.execution.ui.SQLTaskMetrics leak is 
> fixed I would have less of this undesirable data in the driver and I could 
> run my streaming app for a long period of time, but I think there will always 
> be some performance lost.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17381) Memory leak org.apache.spark.sql.execution.ui.SQLTaskMetrics

2017-06-18 Thread Deenbandhu Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053488#comment-16053488
 ] 

Deenbandhu Agarwal commented on SPARK-17381:


[~joaomaiaduarte] I am facing a similar kind of issue. I am running spark 
streaming in the production environment with 6 executors and 1 GB memory and 1 
core each and driver with 3 GB.Spark Version used is 2.0.1. Objects of some 
linked list are getting accumulated over the time in the JVM Heap of driver and 
after 2-3 hours the GC become very frequent and jobs starts queuing up. I tried 
your solution but in vain. We are not using linked list anywhere.  

> Memory leak  org.apache.spark.sql.execution.ui.SQLTaskMetrics
> -
>
> Key: SPARK-17381
> URL: https://issues.apache.org/jira/browse/SPARK-17381
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: EMR 5.0.0 (submitted as yarn-client)
> Java Version  1.8.0_101 (Oracle Corporation)
> Scala Version version 2.11.8
> Problem also happens when I run locally with similar versions of java/scala. 
> OS: Ubuntu 16.04
>Reporter: Joao Duarte
>
> I am running a Spark Streaming application from a Kinesis stream. After some 
> hours running it gets out of memory. After a driver heap dump I found two 
> problems:
> 1) huge amount of org.apache.spark.sql.execution.ui.SQLTaskMetrics (It seems 
> this was a problem before: 
> https://issues.apache.org/jira/browse/SPARK-11192);
> To replicate the org.apache.spark.sql.execution.ui.SQLTaskMetrics leak just 
> needed to run the code below:
> {code}
> val dstream = ssc.union(kinesisStreams)
> dstream.foreachRDD((streamInfo: RDD[Array[Byte]]) => {
>   val toyDF = streamInfo.map(_ =>
> (1, "data","more data "
> ))
> .toDF("Num", "Data", "MoreData" )
>   toyDF.agg(sum("Num")).first().get(0)
> }
> )
> {code}
> 2) huge amount of Array[Byte] (9Gb+)
> After some analysis, I noticed that most of the Array[Byte] where being 
> referenced by objects that were being referenced by SQLTaskMetrics. The 
> strangest thing is that those Array[Byte] were basically text that were 
> loaded in the executors, so they should never be in the driver at all!
> Still could not replicate the 2nd problem with a simple code (the original 
> was complex with data coming from S3, DynamoDB and other databases). However, 
> when I debug the application I can see that in Executor.scala, during 
> reportHeartBeat(),  the data that should not be sent to the driver is being 
> added to "accumUpdates" which, as I understand, will be sent to the driver 
> for reporting.
> To be more precise, one of the taskRunner in the loop "for (taskRunner <- 
> runningTasks.values().asScala)"  contains a GenericInternalRow with a lot of 
> data that should not go to the driver. The path would be in my case: 
> taskRunner.task.metrics.externalAccums[2]._list[0]. This data is similar (if 
> not the same) to the data I see when I do a driver heap dump. 
> I guess that if the org.apache.spark.sql.execution.ui.SQLTaskMetrics leak is 
> fixed I would have less of this undesirable data in the driver and I could 
> run my streaming app for a long period of time, but I think there will always 
> be some performance lost.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming

2017-03-19 Thread Deenbandhu Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931778#comment-15931778
 ] 

Deenbandhu Agarwal commented on SPARK-19644:


full gc is trigger so many times and its frequency increases with time because 
of accumulated memory of that big object 

> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Critical
>  Labels: memory_leak, performance
> Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming

2017-03-19 Thread Deenbandhu Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931771#comment-15931771
 ] 

Deenbandhu Agarwal commented on SPARK-19644:


Yes i have tried restricting the number of jobs retained in UI to 200 and 
moreover the default value is 1000 for number of retained batches and my batch 
interval is 10s so for 1000 batches it will take somewhere around 1 sec 
which is equal to 3-4 hrs but it keeps on accumulating after that. I think 
there is something else which is creating problem 

> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Critical
>  Labels: memory_leak, performance
> Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming

2017-03-19 Thread Deenbandhu Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931765#comment-15931765
 ] 

Deenbandhu Agarwal commented on SPARK-19644:


It's not even clear it has GCed ?

The increase in total GC time is a clear indication of GC 

> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Critical
>  Labels: memory_leak, performance
> Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming

2017-03-17 Thread Deenbandhu Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929979#comment-15929979
 ] 

Deenbandhu Agarwal commented on SPARK-19644:


Any updates ??

> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Critical
>  Labels: memory_leak, performance
> Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming

2017-02-22 Thread Deenbandhu Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879823#comment-15879823
 ] 

Deenbandhu Agarwal commented on SPARK-19644:


I am using scala 2.11

> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Critical
>  Labels: memory_leak, performance
> Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19644) Memory leak in Spark Streaming

2017-02-21 Thread Deenbandhu Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875702#comment-15875702
 ] 

Deenbandhu Agarwal edited comment on SPARK-19644 at 2/21/17 9:54 AM:
-

I have analysed the issue more. I performed some of the experiments as follows 
and analysed the heapdump using jvisualvm at some intervals.

1. Dstream.foreachRdd(rdd => rdd.map(r => 
someCaseClass(r)).take(10).foreach(println))

2. Dstream.foreachRdd(rdd => rdd.map(r => someCaseClass(r)).toDF.show(10,false))

3. Dstream.foreachRdd(rdd => rdd.map(r => someCaseClass(r)).toDS.show(10,false))

I Observed that the number of instances of 
scala.collection.immutable.$colon$colon remain constant in 1 scenario but it 
keeps on increasing in 2 and 3 scenario. So I think there is something leaky in 
toDS or toDF function this may help you out to find out the issue.


was (Author: deenbandhu):
I have analysed the issue more. I performed some of the experiments as follows 
and analysed the heapdump using jvisualvm at some intervals.

1. Dstream.foreachRdd(rdd => rdd.map(x => 
someCaseClass(x)).take(10).foreach(println))

2. Dstream.foreachRdd(rdd => rdd.map(x => someCaseClass(x)).toDF.show(10,false))

3. Dstream.foreachRdd(rdd => rdd.map(x => someCaseClass(x)).toDS.show(10,false))

I Observed that the number of instances of 
scala.collection.immutable.$colon$colon remain constant in 1 scenario but it 
keeps on increasing in 2 and 3 scenario. So I think there is something leaky in 
toDS or toDF function this may help you out to find out the issue.

> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Critical
>  Labels: memory_leak, performance
> Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming

2017-02-21 Thread Deenbandhu Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875702#comment-15875702
 ] 

Deenbandhu Agarwal commented on SPARK-19644:


I have analysed the issue more. I performed some of the experiments as follows 
and analysed the heapdump using jvisualvm at some intervals.

1. Dstream.foreachRdd(rdd => rdd.map(x => 
someCaseClass(x)).take(10).foreach(println))

2. Dstream.foreachRdd(rdd => rdd.map(x => someCaseClass(x)).toDF.show(10,false))

3. Dstream.foreachRdd(rdd => rdd.map(x => someCaseClass(x)).toDS.show(10,false))

I Observed that the number of instances of 
scala.collection.immutable.$colon$colon remain constant in 1 scenario but it 
keeps on increasing in 2 and 3 scenario. So I think there is something leaky in 
toDS or toDF function this may help you out to find out the issue.

> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Critical
>  Labels: memory_leak, performance
> Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19644) Memory leak in Spark Streaming

2017-02-20 Thread Deenbandhu Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874614#comment-15874614
 ] 

Deenbandhu Agarwal edited comment on SPARK-19644 at 2/20/17 2:40 PM:
-

Sorry for the delayed response.

No, I didn't run in spark shell. I ran using spark submit in client deploy mode 
on a standalone spark cluster.
I ran eclipse MAT on heap dump and attached the screenshot of dominator tree. I 
hope this will help you out to find the cause of memory leak. 

Also attached Path to GC root of the object of 
`scala.reflect.runtime.JavaUniverse` (for smaller heap dump taken at the 
application start).

When I checked GC root for an object of 
`scala.collection.immutable.$colon$colon` the path contains the same 
object(`scala.reflect.runtime.JavaUniverse`)


was (Author: deenbandhu):
Sorry for the delayed response.

No, I didn't run in spark shell. I ran using spark submit in client deploy mode 
on a standalone spark cluster.
I ran eclipse MAT on heap dump and attached the screenshot of dominator tree. I 
hope this will help you out to find the cause of memory leak. 

Also attached Path to GC root of the object of 
`scala.reflect.runtime.JavaUniverse` (for smaller heap dump taken at the 
application start).

When I checked GC root for an object of 
`scala.collection.immutable.$colon$colon` the path contains the same 
object(`scala.reflect.runtime.JavaUniverse`)

> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Critical
>  Labels: memory_leak, performance
> Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming

2017-02-20 Thread Deenbandhu Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874614#comment-15874614
 ] 

Deenbandhu Agarwal commented on SPARK-19644:


Sorry for the delayed response.

No, I didn't run in spark shell. I ran using spark submit in client deploy mode 
on a standalone spark cluster.
I ran eclipse MAT on heap dump and attached the screenshot of dominator tree. I 
hope this will help you out to find the cause of memory leak. 

Also attached Path to GC root of the object of 
`scala.reflect.runtime.JavaUniverse` (for smaller heap dump taken at the 
application start).

When I checked GC root for an object of 
`scala.collection.immutable.$colon$colon` the path contains the same 
object(`scala.reflect.runtime.JavaUniverse`)

> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Critical
>  Labels: memory_leak, performance
> Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19644) Memory leak in Spark Streaming

2017-02-20 Thread Deenbandhu Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874614#comment-15874614
 ] 

Deenbandhu Agarwal edited comment on SPARK-19644 at 2/20/17 2:42 PM:
-

Sorry for the delayed response.

No, I didn't run in spark shell. I ran using spark submit in client deploy mode 
on a standalone spark cluster.
I ran eclipse MAT on heap dump and attached the screenshot of dominator tree. I 
hope this will help you out to find the cause of memory leak. 

Also attached Path to GC root of the object of 
`scala.reflect.runtime.JavaUniverse` (for smaller heap dump taken at the 
application start).

When I checked the path to GC root for an object of 
`scala.collection.immutable.$colon$colon` the path contains the same 
object(`scala.reflect.runtime.JavaUniverse`)


was (Author: deenbandhu):
Sorry for the delayed response.

No, I didn't run in spark shell. I ran using spark submit in client deploy mode 
on a standalone spark cluster.
I ran eclipse MAT on heap dump and attached the screenshot of dominator tree. I 
hope this will help you out to find the cause of memory leak. 

Also attached Path to GC root of the object of 
`scala.reflect.runtime.JavaUniverse` (for smaller heap dump taken at the 
application start).

When I checked GC root for an object of 
`scala.collection.immutable.$colon$colon` the path contains the same 
object(`scala.reflect.runtime.JavaUniverse`)

> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Critical
>  Labels: memory_leak, performance
> Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-19644) Memory leak in Spark Streaming

2017-02-20 Thread Deenbandhu Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deenbandhu Agarwal updated SPARK-19644:
---
Comment: was deleted

(was: This is for smaller heap dump taken at the application start.)

> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Critical
>  Labels: memory_leak, performance
> Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19644) Memory leak in Spark Streaming

2017-02-20 Thread Deenbandhu Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deenbandhu Agarwal updated SPARK-19644:
---
Attachment: Dominator_tree.png

> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Critical
>  Labels: memory_leak, performance
> Attachments: Dominator_tree.png, heapdump.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19644) Memory leak in Spark Streaming

2017-02-20 Thread Deenbandhu Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deenbandhu Agarwal updated SPARK-19644:
---
Attachment: Path2GCRoot.png

This is for smaller heap dump taken at the application start.

> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Critical
>  Labels: memory_leak, performance
> Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming

2017-02-17 Thread Deenbandhu Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871528#comment-15871528
 ] 

Deenbandhu Agarwal commented on SPARK-19644:


We are not using any state or window operation and not using any check pointing 
so i don't think  that app is retaining state

> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Critical
>  Labels: memory_leak, performance
> Attachments: heapdump.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19644) Memory leak in Spark Streaming

2017-02-17 Thread Deenbandhu Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871528#comment-15871528
 ] 

Deenbandhu Agarwal edited comment on SPARK-19644 at 2/17/17 9:34 AM:
-

We are not using any state or window operation and not using any check pointing 
so i don't think  that app is retaining state.


was (Author: deenbandhu):
We are not using any state or window operation and not using any check pointing 
so i don't think  that app is retaining state

> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Critical
>  Labels: memory_leak, performance
> Attachments: heapdump.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming

2017-02-17 Thread Deenbandhu Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871515#comment-15871515
 ] 

Deenbandhu Agarwal commented on SPARK-19644:


Yes that's right running out of memory doesn't mean a leak but gradual increase 
in heap size and inability of GC to clear the memory is a memory leak. Ideally 
the number of linked list objects should not be increasing over the period of 
time and that increase is suggesting that there is a memory leak. 

> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Critical
>  Labels: memory_leak, performance
> Attachments: heapdump.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming

2017-02-17 Thread Deenbandhu Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871494#comment-15871494
 ] 

Deenbandhu Agarwal commented on SPARK-19644:


And after 40-50 hours full gc is too frequent that all cores of machines are 
over utilized and batches start to queue up in streaming and I need to restart 
the streaming

> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Critical
>  Labels: memory_leak, performance
> Attachments: heapdump.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19644) Memory leak in Spark Streaming

2017-02-17 Thread Deenbandhu Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871477#comment-15871477
 ] 

Deenbandhu Agarwal edited comment on SPARK-19644 at 2/17/17 9:07 AM:
-

No i had just given 2 GB driver memory and if there is not any reference to 
them the Full GC should clean them but it is not get cleaned thats why i think 
there is memory leak


was (Author: deenbandhu):
No i had just given 2 GB heap and if there is not any reference to them the 
Full GC should clean them but it is not get cleaned thats why i think there is 
memory leak

> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Critical
>  Labels: memory_leak, performance
> Attachments: heapdump.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming

2017-02-17 Thread Deenbandhu Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871477#comment-15871477
 ] 

Deenbandhu Agarwal commented on SPARK-19644:


No i had just given 2 GB heap and if there is not any reference to them the 
Full GC should clean them but it is not get cleaned thats why i think there is 
memory leak

> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Critical
>  Labels: memory_leak, performance
> Attachments: heapdump.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming

2017-02-17 Thread Deenbandhu Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871444#comment-15871444
 ] 

Deenbandhu Agarwal commented on SPARK-19644:


you can ignore that change in memory but if you look in the snapshot the number 
of instances of class scala.collection.immutable.$colon$colon it had increased 
too high and it keep on increasing over the period of time

> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Critical
>  Labels: memory_leak, performance
> Attachments: heapdump.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19644) Memory leak in Spark Streaming

2017-02-16 Thread Deenbandhu Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deenbandhu Agarwal updated SPARK-19644:
---
Affects Version/s: (was: 2.0.1)
   2.0.2
  Environment: 
3 AWS EC2 c3.xLarge
Number of cores - 3
Number of executors 3 
Memory to each executor 2GB

  was:
3 AWS EC2 c3.xLarge
Number of cores - 3
Number of executers 3 
Memory to each executor 2GB

   Labels: memory_leak performance  (was: performance)
  Component/s: (was: Structured Streaming)
   DStreams

> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Critical
>  Labels: memory_leak, performance
> Attachments: heapdump.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19644) Memory leak in Spark Streaming

2017-02-16 Thread Deenbandhu Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deenbandhu Agarwal updated SPARK-19644:
---
Attachment: heapdump.png

Snap shot of heap dump after 50 hours

> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.0.1
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executers 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Critical
>  Labels: performance
> Attachments: heapdump.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19644) Memory leak in Spark Streaming

2017-02-16 Thread Deenbandhu Agarwal (JIRA)
Deenbandhu Agarwal created SPARK-19644:
--

 Summary: Memory leak in Spark Streaming
 Key: SPARK-19644
 URL: https://issues.apache.org/jira/browse/SPARK-19644
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 2.0.1
 Environment: 3 AWS EC2 c3.xLarge
Number of cores - 3
Number of executers 3 
Memory to each executor 2GB
Reporter: Deenbandhu Agarwal
Priority: Critical


I am using streaming on the production for some aggregation and fetching data 
from cassandra and saving data back to cassandra. 

I see a gradual increase in old generation heap capacity from 1161216 Bytes to 
1397760 Bytes over a period of six hours.

After 50 hours of processing instances of class 
scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a huge 
number. 

I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15716) Memory usage of driver keeps growing up in Spark Streaming

2017-02-13 Thread Deenbandhu Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865229#comment-15865229
 ] 

Deenbandhu Agarwal commented on SPARK-15716:


What happened to this issue ? 

> Memory usage of driver keeps growing up in Spark Streaming
> --
>
> Key: SPARK-15716
> URL: https://issues.apache.org/jira/browse/SPARK-15716
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 1.4.1
> Environment: Oracle Java 1.8.0_51, 1.8.0_85, 1.8.0_91 and 1.8.0_92
> SUSE Linux, CentOS 6 and CentOS 7
>Reporter: Yan Chen
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Code:
> {code:java}
> import org.apache.hadoop.io.LongWritable;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
> import org.apache.spark.SparkConf;
> import org.apache.spark.SparkContext;
> import org.apache.spark.streaming.Durations;
> import org.apache.spark.streaming.StreamingContext;
> import org.apache.spark.streaming.api.java.JavaPairDStream;
> import org.apache.spark.streaming.api.java.JavaStreamingContext;
> import org.apache.spark.streaming.api.java.JavaStreamingContextFactory;
> public class App {
>   public static void main(String[] args) {
> final String input = args[0];
> final String check = args[1];
> final long interval = Long.parseLong(args[2]);
> final SparkConf conf = new SparkConf();
> conf.set("spark.streaming.minRememberDuration", "180s");
> conf.set("spark.streaming.receiver.writeAheadLog.enable", "true");
> conf.set("spark.streaming.unpersist", "true");
> conf.set("spark.streaming.ui.retainedBatches", "10");
> conf.set("spark.ui.retainedJobs", "10");
> conf.set("spark.ui.retainedStages", "10");
> conf.set("spark.worker.ui.retainedExecutors", "10");
> conf.set("spark.worker.ui.retainedDrivers", "10");
> conf.set("spark.sql.ui.retainedExecutions", "10");
> JavaStreamingContextFactory jscf = () -> {
>   SparkContext sc = new SparkContext(conf);
>   sc.setCheckpointDir(check);
>   StreamingContext ssc = new StreamingContext(sc, 
> Durations.milliseconds(interval));
>   JavaStreamingContext jssc = new JavaStreamingContext(ssc);
>   jssc.checkpoint(check);
>   // setup pipeline here
>   JavaPairDStream inputStream =
>   jssc.fileStream(
>   input,
>   LongWritable.class,
>   Text.class,
>   TextInputFormat.class,
>   (filepath) -> Boolean.TRUE,
>   false
>   );
>   JavaPairDStream usbk = inputStream
>   .updateStateByKey((current, state) -> state);
>   usbk.checkpoint(Durations.seconds(10));
>   usbk.foreachRDD(rdd -> {
> rdd.count();
> System.out.println("usbk: " + rdd.toDebugString().split("\n").length);
> return null;
>   });
>   return jssc;
> };
> JavaStreamingContext jssc = JavaStreamingContext.getOrCreate(check, jscf);
> jssc.start();
> jssc.awaitTermination();
>   }
> }
> {code}
> Command used to run the code
> {code:none}
> spark-submit --keytab [keytab] --principal [principal] --class [package].App 
> --master yarn --driver-memory 1g --executor-memory 1G --conf 
> "spark.driver.maxResultSize=0" --conf "spark.logConf=true" --conf 
> "spark.executor.instances=2" --conf 
> "spark.executor.extraJavaOptions=-XX:+PrintFlagsFinal -XX:+PrintReferenceGC 
> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 
> -XX:+PrintAdaptiveSizePolicy -XX:+UnlockDiagnosticVMOptions" --conf 
> "spark.driver.extraJavaOptions=-Xloggc:/[dir]/memory-gc.log 
> -XX:+PrintFlagsFinal -XX:+PrintReferenceGC -verbose:gc -XX:+PrintGCDetails 
> -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy 
> -XX:+UnlockDiagnosticVMOptions" [jar-file-path] file:///[dir-on-nas-drive] 
> [dir-on-hdfs] 200
> {code}
> It's a very simple piece of code, when I ran it, the memory usage of driver 
> keeps going up. There is no file input in our runs. Batch interval is set to 
> 200 milliseconds; processing time for each batch is below 150 milliseconds, 
> while most of which are below 70 milliseconds.
> !http://i.imgur.com/uSzUui6.png!
> The right most four red triangles are full GC's which are triggered manually 
> by using "jcmd pid GC.run" command.
> I also did more experiments in the second and third comment I posted.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19283) Application details UI not visible for completed actions

2017-01-18 Thread Deenbandhu Agarwal (JIRA)
Deenbandhu Agarwal created SPARK-19283:
--

 Summary: Application details UI not visible for completed actions
 Key: SPARK-19283
 URL: https://issues.apache.org/jira/browse/SPARK-19283
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 2.0.2
 Environment: ubuntu 14.04 16GB RAM 16 cores
Reporter: Deenbandhu Agarwal
Priority: Minor


I just upgraded from spark 1.6.1 to spark 2.0.2 and tried to enable logging for 
completed jobs using 'spark.eventLog.enabled=true'. 
log files are created in specified directory but no option is visible on UI to 
see the details of the completed jobs 
I checked the file permissions i don't think this will be the issue. 
This was working in spark 1.6.1 with same config 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org