[jira] [Updated] (SPARK-19644) Memory leak in Spark Streaming (Encoder/Scala Reflection)
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-19644: - Description: I am using streaming on the production for some aggregation and fetching data from cassandra and saving data back to cassandra. I see a gradual increase in old generation heap capacity from 1161216 Bytes to 1397760 Bytes over a period of six hours. After 50 hours of processing instances of class scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a huge number. I think this is a clear case of memory leak Updated: The root cause is when creating an encoder object, it leaks several Scala internal objects due to a Scala memory leak issue: https://github.com/scala/bug/issues/8302 was: I am using streaming on the production for some aggregation and fetching data from cassandra and saving data back to cassandra. I see a gradual increase in old generation heap capacity from 1161216 Bytes to 1397760 Bytes over a period of six hours. After 50 hours of processing instances of class scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a huge number. I think this is a clear case of memory leak Updated: The major issue is when creating an encoder object, it leaks several Scala internal objects due to a Scala memory leak issue > Memory leak in Spark Streaming (Encoder/Scala Reflection) > - > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams, SQL, Structured Streaming >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Major > Labels: memory_leak, performance > Attachments: Dominator_tree.png, Path2GCRoot.png, heapdump.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak > Updated: The root cause is when creating an encoder object, it leaks several > Scala internal objects due to a Scala memory leak issue: > https://github.com/scala/bug/issues/8302 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19644) Memory leak in Spark Streaming (Encoder/Scala Reflection)
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-19644: - Description: I am using streaming on the production for some aggregation and fetching data from cassandra and saving data back to cassandra. I see a gradual increase in old generation heap capacity from 1161216 Bytes to 1397760 Bytes over a period of six hours. After 50 hours of processing instances of class scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a huge number. I think this is a clear case of memory leak Updated: The major issue is when creating an encoder object, it leaks several Scala internal objects due to a Scala memory leak issue was: I am using streaming on the production for some aggregation and fetching data from cassandra and saving data back to cassandra. I see a gradual increase in old generation heap capacity from 1161216 Bytes to 1397760 Bytes over a period of six hours. After 50 hours of processing instances of class scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a huge number. I think this is a clear case of memory leak Updated: The major issue is when creating an encoder object, it leaks several Scala internal objects due to a Scala memory leak issue. > Memory leak in Spark Streaming (Encoder/Scala Reflection) > - > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams, SQL, Structured Streaming >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Major > Labels: memory_leak, performance > Attachments: Dominator_tree.png, Path2GCRoot.png, heapdump.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak > Updated: The major issue is when creating an encoder object, it leaks several > Scala internal objects due to a Scala memory leak issue -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19644) Memory leak in Spark Streaming (Encoder/Scala Reflection)
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-19644: - Description: I am using streaming on the production for some aggregation and fetching data from cassandra and saving data back to cassandra. I see a gradual increase in old generation heap capacity from 1161216 Bytes to 1397760 Bytes over a period of six hours. After 50 hours of processing instances of class scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a huge number. I think this is a clear case of memory leak Updated: The major issue is when creating an encoder object, it leaks several Scala internal objects due to a Scala memory leak issue. was: I am using streaming on the production for some aggregation and fetching data from cassandra and saving data back to cassandra. I see a gradual increase in old generation heap capacity from 1161216 Bytes to 1397760 Bytes over a period of six hours. After 50 hours of processing instances of class scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a huge number. I think this is a clear case of memory leak > Memory leak in Spark Streaming (Encoder/Scala Reflection) > - > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams, SQL, Structured Streaming >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Major > Labels: memory_leak, performance > Attachments: Dominator_tree.png, Path2GCRoot.png, heapdump.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak > Updated: The major issue is when creating an encoder object, it leaks several > Scala internal objects due to a Scala memory leak issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19644) Memory leak in Spark Streaming (Encoder/Scala Reflection)
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-19644: - Component/s: Structured Streaming > Memory leak in Spark Streaming (Encoder/Scala Reflection) > - > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams, SQL, Structured Streaming >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Major > Labels: memory_leak, performance > Attachments: Dominator_tree.png, Path2GCRoot.png, heapdump.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-19644: - Component/s: SQL > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams, SQL >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Major > Labels: memory_leak, performance > Attachments: Dominator_tree.png, Path2GCRoot.png, heapdump.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19644) Memory leak in Spark Streaming (Encoder/Scala Reflection)
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-19644: - Summary: Memory leak in Spark Streaming (Encoder/Scala Reflection) (was: Memory leak in Spark Streaming) > Memory leak in Spark Streaming (Encoder/Scala Reflection) > - > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams, SQL >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Major > Labels: memory_leak, performance > Attachments: Dominator_tree.png, Path2GCRoot.png, heapdump.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-19644: -- Priority: Major (was: Critical) The weird thing is memory retained by Scala runtime universe. I am still not clear if you are saying you run out memory or not. I also don't recall any other reports like this. If you have leads, post them here. > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal > Labels: memory_leak, performance > Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deenbandhu Agarwal updated SPARK-19644: --- Attachment: Dominator_tree.png > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Critical > Labels: memory_leak, performance > Attachments: Dominator_tree.png, heapdump.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deenbandhu Agarwal updated SPARK-19644: --- Attachment: Path2GCRoot.png This is for smaller heap dump taken at the application start. > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Critical > Labels: memory_leak, performance > Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deenbandhu Agarwal updated SPARK-19644: --- Affects Version/s: (was: 2.0.1) 2.0.2 Environment: 3 AWS EC2 c3.xLarge Number of cores - 3 Number of executors 3 Memory to each executor 2GB was: 3 AWS EC2 c3.xLarge Number of cores - 3 Number of executers 3 Memory to each executor 2GB Labels: memory_leak performance (was: performance) Component/s: (was: Structured Streaming) DStreams > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Critical > Labels: memory_leak, performance > Attachments: heapdump.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deenbandhu Agarwal updated SPARK-19644: --- Attachment: heapdump.png Snap shot of heap dump after 50 hours > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.0.1 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executers 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Critical > Labels: performance > Attachments: heapdump.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org