[jira] [Updated] (SPARK-19644) Memory leak in Spark Streaming (Encoder/Scala Reflection)

2017-11-01 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-19644:
-
Description: 
I am using streaming on the production for some aggregation and fetching data 
from cassandra and saving data back to cassandra. 

I see a gradual increase in old generation heap capacity from 1161216 Bytes to 
1397760 Bytes over a period of six hours.

After 50 hours of processing instances of class 
scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a huge 
number. 

I think this is a clear case of memory leak

Updated: The root cause is when creating an encoder object, it leaks several 
Scala internal objects due to a Scala memory leak issue: 
https://github.com/scala/bug/issues/8302

  was:
I am using streaming on the production for some aggregation and fetching data 
from cassandra and saving data back to cassandra. 

I see a gradual increase in old generation heap capacity from 1161216 Bytes to 
1397760 Bytes over a period of six hours.

After 50 hours of processing instances of class 
scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a huge 
number. 

I think this is a clear case of memory leak

Updated: The major issue is when creating an encoder object, it leaks several 
Scala internal objects due to a Scala memory leak issue


> Memory leak in Spark Streaming (Encoder/Scala Reflection)
> -
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams, SQL, Structured Streaming
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Major
>  Labels: memory_leak, performance
> Attachments: Dominator_tree.png, Path2GCRoot.png, heapdump.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak
> Updated: The root cause is when creating an encoder object, it leaks several 
> Scala internal objects due to a Scala memory leak issue: 
> https://github.com/scala/bug/issues/8302



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19644) Memory leak in Spark Streaming (Encoder/Scala Reflection)

2017-11-01 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-19644:
-
Description: 
I am using streaming on the production for some aggregation and fetching data 
from cassandra and saving data back to cassandra. 

I see a gradual increase in old generation heap capacity from 1161216 Bytes to 
1397760 Bytes over a period of six hours.

After 50 hours of processing instances of class 
scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a huge 
number. 

I think this is a clear case of memory leak

Updated: The major issue is when creating an encoder object, it leaks several 
Scala internal objects due to a Scala memory leak issue

  was:
I am using streaming on the production for some aggregation and fetching data 
from cassandra and saving data back to cassandra. 

I see a gradual increase in old generation heap capacity from 1161216 Bytes to 
1397760 Bytes over a period of six hours.

After 50 hours of processing instances of class 
scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a huge 
number. 

I think this is a clear case of memory leak

Updated: The major issue is when creating an encoder object, it leaks several 
Scala internal objects due to a Scala memory leak issue.


> Memory leak in Spark Streaming (Encoder/Scala Reflection)
> -
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams, SQL, Structured Streaming
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Major
>  Labels: memory_leak, performance
> Attachments: Dominator_tree.png, Path2GCRoot.png, heapdump.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak
> Updated: The major issue is when creating an encoder object, it leaks several 
> Scala internal objects due to a Scala memory leak issue



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19644) Memory leak in Spark Streaming (Encoder/Scala Reflection)

2017-11-01 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-19644:
-
Description: 
I am using streaming on the production for some aggregation and fetching data 
from cassandra and saving data back to cassandra. 

I see a gradual increase in old generation heap capacity from 1161216 Bytes to 
1397760 Bytes over a period of six hours.

After 50 hours of processing instances of class 
scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a huge 
number. 

I think this is a clear case of memory leak

Updated: The major issue is when creating an encoder object, it leaks several 
Scala internal objects due to a Scala memory leak issue.

  was:
I am using streaming on the production for some aggregation and fetching data 
from cassandra and saving data back to cassandra. 

I see a gradual increase in old generation heap capacity from 1161216 Bytes to 
1397760 Bytes over a period of six hours.

After 50 hours of processing instances of class 
scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a huge 
number. 

I think this is a clear case of memory leak


> Memory leak in Spark Streaming (Encoder/Scala Reflection)
> -
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams, SQL, Structured Streaming
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Major
>  Labels: memory_leak, performance
> Attachments: Dominator_tree.png, Path2GCRoot.png, heapdump.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak
> Updated: The major issue is when creating an encoder object, it leaks several 
> Scala internal objects due to a Scala memory leak issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19644) Memory leak in Spark Streaming (Encoder/Scala Reflection)

2017-11-01 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-19644:
-
Component/s: Structured Streaming

> Memory leak in Spark Streaming (Encoder/Scala Reflection)
> -
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams, SQL, Structured Streaming
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Major
>  Labels: memory_leak, performance
> Attachments: Dominator_tree.png, Path2GCRoot.png, heapdump.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19644) Memory leak in Spark Streaming

2017-11-01 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-19644:
-
Component/s: SQL

> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams, SQL
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Major
>  Labels: memory_leak, performance
> Attachments: Dominator_tree.png, Path2GCRoot.png, heapdump.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19644) Memory leak in Spark Streaming (Encoder/Scala Reflection)

2017-11-01 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-19644:
-
Summary: Memory leak in Spark Streaming (Encoder/Scala Reflection)  (was: 
Memory leak in Spark Streaming)

> Memory leak in Spark Streaming (Encoder/Scala Reflection)
> -
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams, SQL
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Major
>  Labels: memory_leak, performance
> Attachments: Dominator_tree.png, Path2GCRoot.png, heapdump.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19644) Memory leak in Spark Streaming

2017-03-19 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-19644:
--
Priority: Major  (was: Critical)

The weird thing is memory retained by Scala runtime universe. I am still not 
clear if you are saying you run out memory or not. I also don't recall any 
other reports like this. If you have leads, post them here. 

> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>  Labels: memory_leak, performance
> Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19644) Memory leak in Spark Streaming

2017-02-20 Thread Deenbandhu Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deenbandhu Agarwal updated SPARK-19644:
---
Attachment: Dominator_tree.png

> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Critical
>  Labels: memory_leak, performance
> Attachments: Dominator_tree.png, heapdump.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19644) Memory leak in Spark Streaming

2017-02-20 Thread Deenbandhu Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deenbandhu Agarwal updated SPARK-19644:
---
Attachment: Path2GCRoot.png

This is for smaller heap dump taken at the application start.

> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Critical
>  Labels: memory_leak, performance
> Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19644) Memory leak in Spark Streaming

2017-02-16 Thread Deenbandhu Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deenbandhu Agarwal updated SPARK-19644:
---
Affects Version/s: (was: 2.0.1)
   2.0.2
  Environment: 
3 AWS EC2 c3.xLarge
Number of cores - 3
Number of executors 3 
Memory to each executor 2GB

  was:
3 AWS EC2 c3.xLarge
Number of cores - 3
Number of executers 3 
Memory to each executor 2GB

   Labels: memory_leak performance  (was: performance)
  Component/s: (was: Structured Streaming)
   DStreams

> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.2
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executors 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Critical
>  Labels: memory_leak, performance
> Attachments: heapdump.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19644) Memory leak in Spark Streaming

2017-02-16 Thread Deenbandhu Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deenbandhu Agarwal updated SPARK-19644:
---
Attachment: heapdump.png

Snap shot of heap dump after 50 hours

> Memory leak in Spark Streaming
> --
>
> Key: SPARK-19644
> URL: https://issues.apache.org/jira/browse/SPARK-19644
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.0.1
> Environment: 3 AWS EC2 c3.xLarge
> Number of cores - 3
> Number of executers 3 
> Memory to each executor 2GB
>Reporter: Deenbandhu Agarwal
>Priority: Critical
>  Labels: performance
> Attachments: heapdump.png
>
>
> I am using streaming on the production for some aggregation and fetching data 
> from cassandra and saving data back to cassandra. 
> I see a gradual increase in old generation heap capacity from 1161216 Bytes 
> to 1397760 Bytes over a period of six hours.
> After 50 hours of processing instances of class 
> scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a 
> huge number. 
> I think this is a clear case of memory leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org