[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming (Encoder/Scala Reflection)
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16247973#comment-16247973 ] Apache Spark commented on SPARK-19644: -- User 'zsxwing' has created a pull request for this issue: https://github.com/apache/spark/pull/19718 > Memory leak in Spark Streaming (Encoder/Scala Reflection) > - > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams, SQL, Structured Streaming >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal > Labels: memory_leak, performance > Attachments: Dominator_tree.png, Path2GCRoot.png, heapdump.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak > Updated: The root cause is when creating an encoder object, it leaks several > Scala internal objects due to a Scala memory leak issue: > https://github.com/scala/bug/issues/8302 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming (Encoder/Scala Reflection)
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243171#comment-16243171 ] Apache Spark commented on SPARK-19644: -- User 'zsxwing' has created a pull request for this issue: https://github.com/apache/spark/pull/19687 > Memory leak in Spark Streaming (Encoder/Scala Reflection) > - > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams, SQL, Structured Streaming >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal > Labels: memory_leak, performance > Attachments: Dominator_tree.png, Path2GCRoot.png, heapdump.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak > Updated: The root cause is when creating an encoder object, it leaks several > Scala internal objects due to a Scala memory leak issue: > https://github.com/scala/bug/issues/8302 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming (Encoder/Scala Reflection)
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234655#comment-16234655 ] Shixiong Zhu commented on SPARK-19644: -- I added more components since it also affects them. The major issue is when creating an encoder object, it leaks several Scala internal objects due to a Scala memory leak issue. > Memory leak in Spark Streaming (Encoder/Scala Reflection) > - > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams, SQL, Structured Streaming >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Major > Labels: memory_leak, performance > Attachments: Dominator_tree.png, Path2GCRoot.png, heapdump.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234643#comment-16234643 ] Shixiong Zhu commented on SPARK-19644: -- By the way, you can confirm this issue by checking if the number of "scala.reflect.internal.tpe.TypeConstraints$TypeConstraint" is large. > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Major > Labels: memory_leak, performance > Attachments: Dominator_tree.png, Path2GCRoot.png, heapdump.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234638#comment-16234638 ] Shixiong Zhu commented on SPARK-19644: -- I happened to investigate a similar issue and found out the leak is caused by Scala reflection. Please see my comment in https://github.com/scala/bug/issues/8302 My workaround is calling "scala.reflect.runtime.universe.asInstanceOf[scala.reflect.runtime.JavaUniverse].undoLog.clear()" manually to clean up these garbage objects. You need to put this line in the same thread that you create Dataset/DataFrame as the leak happens in a thread local object. I think the best place in Spark streaming is foreachRDD. > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Major > Labels: memory_leak, performance > Attachments: Dominator_tree.png, Path2GCRoot.png, heapdump.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233756#comment-16233756 ] deng commented on SPARK-19644: -- did the issue has been fixed? I am using spark 2.1.0 and i also encount the same scene. The driver memory keep increasing. I analysis the dump heap and find the class scala.collection.immutable.$colon$colon keep increasing. > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Major > Labels: memory_leak, performance > Attachments: Dominator_tree.png, Path2GCRoot.png, heapdump.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053509#comment-16053509 ] Dongjoon Hyun commented on SPARK-19644: --- I see. Thank you anyway, [~deenbandhu]. > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal > Labels: memory_leak, performance > Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053502#comment-16053502 ] Deenbandhu Agarwal commented on SPARK-19644: And the spark cassandra connector is also not out for those spark version. which is a dependency for us > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal > Labels: memory_leak, performance > Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053498#comment-16053498 ] Deenbandhu Agarwal commented on SPARK-19644: yes i can try but is there any report of such events in that particular version > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal > Labels: memory_leak, performance > Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053494#comment-16053494 ] Dongjoon Hyun commented on SPARK-19644: --- Hi, [~deenbandhu]. Could you try this with the latest versions like 2.1.1 or 2.2.0-RC4? > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal > Labels: memory_leak, performance > Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931778#comment-15931778 ] Deenbandhu Agarwal commented on SPARK-19644: full gc is trigger so many times and its frequency increases with time because of accumulated memory of that big object > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Critical > Labels: memory_leak, performance > Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931774#comment-15931774 ] Sean Owen commented on SPARK-19644: --- Use jcmd to trigger a full GC on the process to see what happens. > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Critical > Labels: memory_leak, performance > Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931771#comment-15931771 ] Deenbandhu Agarwal commented on SPARK-19644: Yes i have tried restricting the number of jobs retained in UI to 200 and moreover the default value is 1000 for number of retained batches and my batch interval is 10s so for 1000 batches it will take somewhere around 1 sec which is equal to 3-4 hrs but it keeps on accumulating after that. I think there is something else which is creating problem > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Critical > Labels: memory_leak, performance > Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931768#comment-15931768 ] Sean Owen commented on SPARK-19644: --- I don't see any GC time here. Is this not simple stuff like you have lots of old jobs and stage metrics info in the driver? There is not much memory used here compared to normal operation. Unless you've tried stuff like restricting the number of retained jobs in the UI I don't think there is evidence of a problem. The dumps don't show anything that odd. > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Critical > Labels: memory_leak, performance > Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931765#comment-15931765 ] Deenbandhu Agarwal commented on SPARK-19644: It's not even clear it has GCed ? The increase in total GC time is a clear indication of GC > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Critical > Labels: memory_leak, performance > Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931150#comment-15931150 ] Sean Owen commented on SPARK-19644: --- I don't think there is evidence of a memory leak here. It's not even clear it has GCed > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Critical > Labels: memory_leak, performance > Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929979#comment-15929979 ] Deenbandhu Agarwal commented on SPARK-19644: Any updates ?? > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Critical > Labels: memory_leak, performance > Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879823#comment-15879823 ] Deenbandhu Agarwal commented on SPARK-19644: I am using scala 2.11 > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Critical > Labels: memory_leak, performance > Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879397#comment-15879397 ] Shixiong Zhu commented on SPARK-19644: -- [~deenbandhu] Do you use Scala 2.10 or Scala 2.11? > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Critical > Labels: memory_leak, performance > Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875702#comment-15875702 ] Deenbandhu Agarwal commented on SPARK-19644: I have analysed the issue more. I performed some of the experiments as follows and analysed the heapdump using jvisualvm at some intervals. 1. Dstream.foreachRdd(rdd => rdd.map(x => someCaseClass(x)).take(10).foreach(println)) 2. Dstream.foreachRdd(rdd => rdd.map(x => someCaseClass(x)).toDF.show(10,false)) 3. Dstream.foreachRdd(rdd => rdd.map(x => someCaseClass(x)).toDS.show(10,false)) I Observed that the number of instances of scala.collection.immutable.$colon$colon remain constant in 1 scenario but it keeps on increasing in 2 and 3 scenario. So I think there is something leaky in toDS or toDF function this may help you out to find out the issue. > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Critical > Labels: memory_leak, performance > Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874614#comment-15874614 ] Deenbandhu Agarwal commented on SPARK-19644: Sorry for the delayed response. No, I didn't run in spark shell. I ran using spark submit in client deploy mode on a standalone spark cluster. I ran eclipse MAT on heap dump and attached the screenshot of dominator tree. I hope this will help you out to find the cause of memory leak. Also attached Path to GC root of the object of `scala.reflect.runtime.JavaUniverse` (for smaller heap dump taken at the application start). When I checked GC root for an object of `scala.collection.immutable.$colon$colon` the path contains the same object(`scala.reflect.runtime.JavaUniverse`) > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Critical > Labels: memory_leak, performance > Attachments: Dominator_tree.png, heapdump.png, Path2GCRoot.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872672#comment-15872672 ] Shixiong Zhu commented on SPARK-19644: -- [~deenbandhu] Could you check the GC root, please? These objects are from Scala reflection. Did you run the job in Spark shell? > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Critical > Labels: memory_leak, performance > Attachments: heapdump.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871528#comment-15871528 ] Deenbandhu Agarwal commented on SPARK-19644: We are not using any state or window operation and not using any check pointing so i don't think that app is retaining state > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Critical > Labels: memory_leak, performance > Attachments: heapdump.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871519#comment-15871519 ] Sean Owen commented on SPARK-19644: --- This is still not necessarily true in general. Your app could be retaining state. This is why this isn't actionable as-is. > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Critical > Labels: memory_leak, performance > Attachments: heapdump.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871515#comment-15871515 ] Deenbandhu Agarwal commented on SPARK-19644: Yes that's right running out of memory doesn't mean a leak but gradual increase in heap size and inability of GC to clear the memory is a memory leak. Ideally the number of linked list objects should not be increasing over the period of time and that increase is suggesting that there is a memory leak. > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Critical > Labels: memory_leak, performance > Attachments: heapdump.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871503#comment-15871503 ] Sean Owen commented on SPARK-19644: --- You only show 400MB of lists in your screen dump. Running out of memory doesn't mean a leak. The question is what is holding on to the memory? This isn't a big heap to begin with. > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Critical > Labels: memory_leak, performance > Attachments: heapdump.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871494#comment-15871494 ] Deenbandhu Agarwal commented on SPARK-19644: And after 40-50 hours full gc is too frequent that all cores of machines are over utilized and batches start to queue up in streaming and I need to restart the streaming > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Critical > Labels: memory_leak, performance > Attachments: heapdump.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871477#comment-15871477 ] Deenbandhu Agarwal commented on SPARK-19644: No i had just given 2 GB heap and if there is not any reference to them the Full GC should clean them but it is not get cleaned thats why i think there is memory leak > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Critical > Labels: memory_leak, performance > Attachments: heapdump.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871452#comment-15871452 ] Sean Owen commented on SPARK-19644: --- There are just linked list objects. Why are they too high? if you have plenty of heap, Java won't bother GCing until it needs to. > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Critical > Labels: memory_leak, performance > Attachments: heapdump.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871444#comment-15871444 ] Deenbandhu Agarwal commented on SPARK-19644: you can ignore that change in memory but if you look in the snapshot the number of instances of class scala.collection.immutable.$colon$colon it had increased too high and it keep on increasing over the period of time > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Critical > Labels: memory_leak, performance > Attachments: heapdump.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19644) Memory leak in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871436#comment-15871436 ] Sean Owen commented on SPARK-19644: --- What you have described so far is not a memory leak in Spark. It's normal for the heap to grow unless it has reason to even garbage collect. You're not evidently running out of memory. You're talking about a heap change of 1.1 to 1.3MB, which is trivial (is this a typo?). I'd close this unless you have a clearer case. > Memory leak in Spark Streaming > -- > > Key: SPARK-19644 > URL: https://issues.apache.org/jira/browse/SPARK-19644 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.2 > Environment: 3 AWS EC2 c3.xLarge > Number of cores - 3 > Number of executors 3 > Memory to each executor 2GB >Reporter: Deenbandhu Agarwal >Priority: Critical > Labels: memory_leak, performance > Attachments: heapdump.png > > > I am using streaming on the production for some aggregation and fetching data > from cassandra and saving data back to cassandra. > I see a gradual increase in old generation heap capacity from 1161216 Bytes > to 1397760 Bytes over a period of six hours. > After 50 hours of processing instances of class > scala.collection.immutable.$colon$colon incresed to 12,811,793 which is a > huge number. > I think this is a clear case of memory leak -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org