[jira] [Commented] (SPARK-13198) sc.stop() does not clean up on driver, causes Java heap OOM.

2017-01-11 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15818594#comment-15818594
 ] 

Sean Owen commented on SPARK-13198:
---

I think that's up to you if you're interested in this? it's not clear what the 
issue is, but it's also not supported usage.

> sc.stop() does not clean up on driver, causes Java heap OOM.
> 
>
> Key: SPARK-13198
> URL: https://issues.apache.org/jira/browse/SPARK-13198
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Herman Schistad
> Attachments: Screen Shot 2016-02-04 at 16.31.28.png, Screen Shot 
> 2016-02-04 at 16.31.40.png, Screen Shot 2016-02-04 at 16.31.51.png, Screen 
> Shot 2016-02-08 at 09.30.59.png, Screen Shot 2016-02-08 at 09.31.10.png, 
> Screen Shot 2016-02-08 at 10.03.04.png, gc.log
>
>
> When starting and stopping multiple SparkContext's linearly eventually the 
> driver stops working with a "io.netty.handler.codec.EncoderException: 
> java.lang.OutOfMemoryError: Java heap space" error.
> Reproduce by running the following code and loading in ~7MB parquet data each 
> time. The driver heap space is not changed and thus defaults to 1GB:
> {code:java}
> def main(args: Array[String]) {
>   val conf = new SparkConf().setMaster("MASTER_URL").setAppName("")
>   conf.set("spark.mesos.coarse", "true")
>   conf.set("spark.cores.max", "10")
>   for (i <- 1 until 100) {
> val sc = new SparkContext(conf)
> val sqlContext = new SQLContext(sc)
> val events = sqlContext.read.parquet("hdfs://locahost/tmp/something")
> println(s"Context ($i), number of events: " + events.count)
> sc.stop()
>   }
> }
> {code}
> The heap space fills up within 20 loops on my cluster. Increasing the number 
> of cores to 50 in the above example results in heap space error after 12 
> contexts.
> Dumping the heap reveals many equally sized "CoarseMesosSchedulerBackend" 
> objects (see attachments). Digging into the inner objects tells me that the 
> `executorDataMap` is where 99% of the data in said object is stored. I do 
> believe though that this is beside the point as I'd expect this whole object 
> to be garbage collected or freed on sc.stop(). 
> Additionally I can see in the Spark web UI that each time a new context is 
> created the number of the "SQL" tab increments by one (i.e. last iteration 
> would have SQL99). After doing stop and creating a completely new context I 
> was expecting this number to be reset to 1 ("SQL").
> I'm submitting the jar file with `spark-submit` and no special flags. The 
> cluster is running Mesos 0.23. I'm running Spark 1.6.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13198) sc.stop() does not clean up on driver, causes Java heap OOM.

2017-01-11 Thread Dmytro Bielievtsov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15818498#comment-15818498
 ] 

Dmytro Bielievtsov commented on SPARK-13198:


[~srowen] Looks like a growing number of people needs this functionality. As 
some who knows the codebase, can you give a rough estimate of the amount of 
work it might take to make Spark guarantee a good cleanup, equivalent to the 
JVM shutdown? Or maybe one could hack this away by somehow restarting the 
corresponding JVM without exiting current python interpreter? If this is 
reasonable amount of work, I might try to cut out some of our team's time to 
work on the corresponding pull request.

> sc.stop() does not clean up on driver, causes Java heap OOM.
> 
>
> Key: SPARK-13198
> URL: https://issues.apache.org/jira/browse/SPARK-13198
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Herman Schistad
> Attachments: Screen Shot 2016-02-04 at 16.31.28.png, Screen Shot 
> 2016-02-04 at 16.31.40.png, Screen Shot 2016-02-04 at 16.31.51.png, Screen 
> Shot 2016-02-08 at 09.30.59.png, Screen Shot 2016-02-08 at 09.31.10.png, 
> Screen Shot 2016-02-08 at 10.03.04.png, gc.log
>
>
> When starting and stopping multiple SparkContext's linearly eventually the 
> driver stops working with a "io.netty.handler.codec.EncoderException: 
> java.lang.OutOfMemoryError: Java heap space" error.
> Reproduce by running the following code and loading in ~7MB parquet data each 
> time. The driver heap space is not changed and thus defaults to 1GB:
> {code:java}
> def main(args: Array[String]) {
>   val conf = new SparkConf().setMaster("MASTER_URL").setAppName("")
>   conf.set("spark.mesos.coarse", "true")
>   conf.set("spark.cores.max", "10")
>   for (i <- 1 until 100) {
> val sc = new SparkContext(conf)
> val sqlContext = new SQLContext(sc)
> val events = sqlContext.read.parquet("hdfs://locahost/tmp/something")
> println(s"Context ($i), number of events: " + events.count)
> sc.stop()
>   }
> }
> {code}
> The heap space fills up within 20 loops on my cluster. Increasing the number 
> of cores to 50 in the above example results in heap space error after 12 
> contexts.
> Dumping the heap reveals many equally sized "CoarseMesosSchedulerBackend" 
> objects (see attachments). Digging into the inner objects tells me that the 
> `executorDataMap` is where 99% of the data in said object is stored. I do 
> believe though that this is beside the point as I'd expect this whole object 
> to be garbage collected or freed on sc.stop(). 
> Additionally I can see in the Spark web UI that each time a new context is 
> created the number of the "SQL" tab increments by one (i.e. last iteration 
> would have SQL99). After doing stop and creating a completely new context I 
> was expecting this number to be reset to 1 ("SQL").
> I'm submitting the jar file with `spark-submit` and no special flags. The 
> cluster is running Mesos 0.23. I'm running Spark 1.6.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13198) sc.stop() does not clean up on driver, causes Java heap OOM.

2016-02-08 Thread Herman Schistad (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136774#comment-15136774
 ] 

Herman Schistad commented on SPARK-13198:
-

Digging more into the dumped heap and running a memory leak report (using 
Eclipse Memory Analyzer) I'm seeing the following result:

!Screen Shot 2016-02-08 at 10.03.04.png|width=400!

> sc.stop() does not clean up on driver, causes Java heap OOM.
> 
>
> Key: SPARK-13198
> URL: https://issues.apache.org/jira/browse/SPARK-13198
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Herman Schistad
> Attachments: Screen Shot 2016-02-04 at 16.31.28.png, Screen Shot 
> 2016-02-04 at 16.31.40.png, Screen Shot 2016-02-04 at 16.31.51.png, Screen 
> Shot 2016-02-08 at 09.30.59.png, Screen Shot 2016-02-08 at 09.31.10.png, 
> Screen Shot 2016-02-08 at 10.03.04.png, gc.log
>
>
> When starting and stopping multiple SparkContext's linearly eventually the 
> driver stops working with a "io.netty.handler.codec.EncoderException: 
> java.lang.OutOfMemoryError: Java heap space" error.
> Reproduce by running the following code and loading in ~7MB parquet data each 
> time. The driver heap space is not changed and thus defaults to 1GB:
> {code:java}
> def main(args: Array[String]) {
>   val conf = new SparkConf().setMaster("MASTER_URL").setAppName("")
>   conf.set("spark.mesos.coarse", "true")
>   conf.set("spark.cores.max", "10")
>   for (i <- 1 until 100) {
> val sc = new SparkContext(conf)
> val sqlContext = new SQLContext(sc)
> val events = sqlContext.read.parquet("hdfs://locahost/tmp/something")
> println(s"Context ($i), number of events: " + events.count)
> sc.stop()
>   }
> }
> {code}
> The heap space fills up within 20 loops on my cluster. Increasing the number 
> of cores to 50 in the above example results in heap space error after 12 
> contexts.
> Dumping the heap reveals many equally sized "CoarseMesosSchedulerBackend" 
> objects (see attachments). Digging into the inner objects tells me that the 
> `executorDataMap` is where 99% of the data in said object is stored. I do 
> believe though that this is beside the point as I'd expect this whole object 
> to be garbage collected or freed on sc.stop(). 
> Additionally I can see in the Spark web UI that each time a new context is 
> created the number of the "SQL" tab increments by one (i.e. last iteration 
> would have SQL99). After doing stop and creating a completely new context I 
> was expecting this number to be reset to 1 ("SQL").
> I'm submitting the jar file with `spark-submit` and no special flags. The 
> cluster is running Mesos 0.23. I'm running Spark 1.6.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13198) sc.stop() does not clean up on driver, causes Java heap OOM.

2016-02-08 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136773#comment-15136773
 ] 

Sean Owen commented on SPARK-13198:
---

I don't think stop() is relevant here. There's not an active attempt to free up 
resources once the app is done. It's assumed the driver JVM is shutting down.

Yes, the question was whether it had tried to do a full GC, and sounds like it 
has done, OK.

Still if you're just finding there is a bunch of left over bookkeeping info for 
executors, probably from all the old contexts, I think that's "normal" or at 
least "not a problem as Spark is intended to be used"

> sc.stop() does not clean up on driver, causes Java heap OOM.
> 
>
> Key: SPARK-13198
> URL: https://issues.apache.org/jira/browse/SPARK-13198
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Herman Schistad
> Attachments: Screen Shot 2016-02-04 at 16.31.28.png, Screen Shot 
> 2016-02-04 at 16.31.40.png, Screen Shot 2016-02-04 at 16.31.51.png, Screen 
> Shot 2016-02-08 at 09.30.59.png, Screen Shot 2016-02-08 at 09.31.10.png, 
> Screen Shot 2016-02-08 at 10.03.04.png, gc.log
>
>
> When starting and stopping multiple SparkContext's linearly eventually the 
> driver stops working with a "io.netty.handler.codec.EncoderException: 
> java.lang.OutOfMemoryError: Java heap space" error.
> Reproduce by running the following code and loading in ~7MB parquet data each 
> time. The driver heap space is not changed and thus defaults to 1GB:
> {code:java}
> def main(args: Array[String]) {
>   val conf = new SparkConf().setMaster("MASTER_URL").setAppName("")
>   conf.set("spark.mesos.coarse", "true")
>   conf.set("spark.cores.max", "10")
>   for (i <- 1 until 100) {
> val sc = new SparkContext(conf)
> val sqlContext = new SQLContext(sc)
> val events = sqlContext.read.parquet("hdfs://locahost/tmp/something")
> println(s"Context ($i), number of events: " + events.count)
> sc.stop()
>   }
> }
> {code}
> The heap space fills up within 20 loops on my cluster. Increasing the number 
> of cores to 50 in the above example results in heap space error after 12 
> contexts.
> Dumping the heap reveals many equally sized "CoarseMesosSchedulerBackend" 
> objects (see attachments). Digging into the inner objects tells me that the 
> `executorDataMap` is where 99% of the data in said object is stored. I do 
> believe though that this is beside the point as I'd expect this whole object 
> to be garbage collected or freed on sc.stop(). 
> Additionally I can see in the Spark web UI that each time a new context is 
> created the number of the "SQL" tab increments by one (i.e. last iteration 
> would have SQL99). After doing stop and creating a completely new context I 
> was expecting this number to be reset to 1 ("SQL").
> I'm submitting the jar file with `spark-submit` and no special flags. The 
> cluster is running Mesos 0.23. I'm running Spark 1.6.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13198) sc.stop() does not clean up on driver, causes Java heap OOM.

2016-02-08 Thread Herman Schistad (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136757#comment-15136757
 ] 

Herman Schistad commented on SPARK-13198:
-

Hi [~srowen], thanks for your reply. I have indeed tried to look at the program 
using a profiler and I've attached two screenshots from jvisualvm connected to 
the driver JMX interface. You can see that the "Old Gen" space is completely 
full. You see that dip at 09:30:00? That's me triggering a manual GC.

It might be unusual to do this, but in any case (given the existence of 
sc.stop()) it should work right? My use case is having X number of different 
parquet directories which need to be loaded and analysed linearly, as part of a 
generic platform where users are able to upload data and apply daily/hourly 
aggregations on them. I've also seen people starting and stopping contexts 
quite frequently when doing unit tests etc.

Using G1 garbage collection doesn't seem to affect the end result either.

I'm also attaching a GC log in it's raw format. You can see it's trying to do a 
full GC at multiple times during the execution of the program.

Thanks again Sean.

> sc.stop() does not clean up on driver, causes Java heap OOM.
> 
>
> Key: SPARK-13198
> URL: https://issues.apache.org/jira/browse/SPARK-13198
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Herman Schistad
> Attachments: Screen Shot 2016-02-04 at 16.31.28.png, Screen Shot 
> 2016-02-04 at 16.31.40.png, Screen Shot 2016-02-04 at 16.31.51.png
>
>
> When starting and stopping multiple SparkContext's linearly eventually the 
> driver stops working with a "io.netty.handler.codec.EncoderException: 
> java.lang.OutOfMemoryError: Java heap space" error.
> Reproduce by running the following code and loading in ~7MB parquet data each 
> time. The driver heap space is not changed and thus defaults to 1GB:
> {code:java}
> def main(args: Array[String]) {
>   val conf = new SparkConf().setMaster("MASTER_URL").setAppName("")
>   conf.set("spark.mesos.coarse", "true")
>   conf.set("spark.cores.max", "10")
>   for (i <- 1 until 100) {
> val sc = new SparkContext(conf)
> val sqlContext = new SQLContext(sc)
> val events = sqlContext.read.parquet("hdfs://locahost/tmp/something")
> println(s"Context ($i), number of events: " + events.count)
> sc.stop()
>   }
> }
> {code}
> The heap space fills up within 20 loops on my cluster. Increasing the number 
> of cores to 50 in the above example results in heap space error after 12 
> contexts.
> Dumping the heap reveals many equally sized "CoarseMesosSchedulerBackend" 
> objects (see attachments). Digging into the inner objects tells me that the 
> `executorDataMap` is where 99% of the data in said object is stored. I do 
> believe though that this is beside the point as I'd expect this whole object 
> to be garbage collected or freed on sc.stop(). 
> Additionally I can see in the Spark web UI that each time a new context is 
> created the number of the "SQL" tab increments by one (i.e. last iteration 
> would have SQL99). After doing stop and creating a completely new context I 
> was expecting this number to be reset to 1 ("SQL").
> I'm submitting the jar file with `spark-submit` and no special flags. The 
> cluster is running Mesos 0.23. I'm running Spark 1.6.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13198) sc.stop() does not clean up on driver, causes Java heap OOM.

2016-02-06 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135889#comment-15135889
 ] 

Sean Owen commented on SPARK-13198:
---

You've described a different problem, which you opened a duplicate JIRA for. 
This isn't a problem though; you can't change masters in one app.

> sc.stop() does not clean up on driver, causes Java heap OOM.
> 
>
> Key: SPARK-13198
> URL: https://issues.apache.org/jira/browse/SPARK-13198
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Herman Schistad
> Attachments: Screen Shot 2016-02-04 at 16.31.28.png, Screen Shot 
> 2016-02-04 at 16.31.40.png, Screen Shot 2016-02-04 at 16.31.51.png
>
>
> When starting and stopping multiple SparkContext's linearly eventually the 
> driver stops working with a "io.netty.handler.codec.EncoderException: 
> java.lang.OutOfMemoryError: Java heap space" error.
> Reproduce by running the following code and loading in ~7MB parquet data each 
> time. The driver heap space is not changed and thus defaults to 1GB:
> {code:java}
> def main(args: Array[String]) {
>   val conf = new SparkConf().setMaster("MASTER_URL").setAppName("")
>   conf.set("spark.mesos.coarse", "true")
>   conf.set("spark.cores.max", "10")
>   for (i <- 1 until 100) {
> val sc = new SparkContext(conf)
> val sqlContext = new SQLContext(sc)
> val events = sqlContext.read.parquet("hdfs://locahost/tmp/something")
> println(s"Context ($i), number of events: " + events.count)
> sc.stop()
>   }
> }
> {code}
> The heap space fills up within 20 loops on my cluster. Increasing the number 
> of cores to 50 in the above example results in heap space error after 12 
> contexts.
> Dumping the heap reveals many equally sized "CoarseMesosSchedulerBackend" 
> objects (see attachments). Digging into the inner objects tells me that the 
> `executorDataMap` is where 99% of the data in said object is stored. I do 
> believe though that this is beside the point as I'd expect this whole object 
> to be garbage collected or freed on sc.stop(). 
> Additionally I can see in the Spark web UI that each time a new context is 
> created the number of the "SQL" tab increments by one (i.e. last iteration 
> would have SQL99). After doing stop and creating a completely new context I 
> was expecting this number to be reset to 1 ("SQL").
> I'm submitting the jar file with `spark-submit` and no special flags. The 
> cluster is running Mesos 0.23. I'm running Spark 1.6.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13198) sc.stop() does not clean up on driver, causes Java heap OOM.

2016-02-05 Thread leo wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135036#comment-15135036
 ] 

leo wu commented on SPARK-13198:


Hi, Sean

 I am trying to do this similar in an IPython/Jupyter notebook by  stopping 
a sparkContext and then start a new one with new sparkconf over a remote Spark 
standalone cluster, instead of local master which is originally initialized,  
like :

import sys
from random import random
from operator import add
import atexit
import os
import platform

import py4j

import pyspark
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext, HiveContext
from pyspark.storagelevel import StorageLevel


os.environ["SPARK_HOME"] = "/home/notebook/spark-1.6.0-bin-hadoop2.6"
os.environ["PYSPARK_SUBMIT_ARGS"] = "--master spark://10.115.89.219:7077"
os.environ["SPARK_LOCAL_HOSTNAME"] = "wzymaster2011"

SparkContext.setSystemProperty("spark.master", "spark://10.115.89.219:7077")
SparkContext.setSystemProperty("spark.cores.max", "4")
SparkContext.setSystemProperty("spark.driver.host", "wzymaster2011")
SparkContext.setSystemProperty("spark.driver.port", "9000")
SparkContext.setSystemProperty("spark.blockManager.port", "9001")
SparkContext.setSystemProperty("spark.fileserver.port", "9002")

conf = SparkConf().setAppName("Leo-Python-Test")
sc = SparkContext(conf=conf)

However, I always get error on executor due to failing to load BlockManager 
info from driver but "localhost" , instead of setting in "spark.driver.host"  
like "wzymaster2011":

16/02/05 14:37:32 DEBUG BlockManager: Getting remote block broadcast_0_piece0 
from BlockManagerId(driver, localhost, 9002)
16/02/05 14:37:32 DEBUG TransportClientFactory: Creating new connection to 
localhost/127.0.0.1:9002
16/02/05 14:37:32 ERROR RetryingBlockFetcher: Exception while beginning fetch 
of 1 outstanding blocks
java.io.IOException: Failed to connect to localhost/127.0.0.1:9002
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
at 
org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:90)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)


So, I strongly suspect if there is a bug in SparkContext.stop() to clean up all 
data and resetting  SparkConf() doesn't work well within one app.

Please advise it. 
Millions of thanks


> sc.stop() does not clean up on driver, causes Java heap OOM.
> 
>
> Key: SPARK-13198
> URL: https://issues.apache.org/jira/browse/SPARK-13198
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Herman Schistad
> Attachments: Screen Shot 2016-02-04 at 16.31.28.png, Screen Shot 
> 2016-02-04 at 16.31.40.png, Screen Shot 2016-02-04 at 16.31.51.png
>
>
> When starting and stopping multiple SparkContext's linearly eventually the 
> driver stops working with a "io.netty.handler.codec.EncoderException: 
> java.lang.OutOfMemoryError: Java heap space" error.
> Reproduce by running the following code and loading in ~7MB parquet data each 
> time. The driver heap space is not changed and thus defaults to 1GB:
> {code:java}
> def main(args: Array[String]) {
>   val conf = new SparkConf().setMaster("MASTER_URL").setAppName("")
>   conf.set("spark.mesos.coarse", "true")
>   conf.set("spark.cores.max", "10")
>   for (i <- 1 until 100) {
> val sc = new SparkContext(conf)
> val sqlContext = new SQLContext(sc)
> val events = sqlContext.read.parquet("hdfs://locahost/tmp/something")
> println(s"Context ($i), number of events: " + events.count)
> sc.stop()
>   }
> }
> {code}
> The heap space fills up within 20 loops on my cluster. Increasing the number 
> of cores to 50 in the above example results in heap space error after 12 
> contexts.
> Dumping the heap reveals many equally sized "CoarseMesosSchedulerBackend" 
> objects (see attachments). Digging into the inner objects tells me that the 
> `executorDataMap` is where 99% of the data in said object is stored. I do 
> believe though that this is beside the point as I'd expect this whole object 
> to be garbage collected or freed on sc.stop(). 
> Additionally I can see in the Spark web UI that each time a new context is 
> created the number of the "SQL" tab increments by one (i.e. last iteration 
> would have SQL99). After doing stop and creating a completely new context I 
> was expecting this number to be reset to 1 ("SQL").
> I'm submitting the jar file with `spark-submit` and no special flags. The 
> cluster is running Mesos 0.23. I'm r

[jira] [Commented] (SPARK-13198) sc.stop() does not clean up on driver, causes Java heap OOM.

2016-02-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15134028#comment-15134028
 ] 

Sean Owen commented on SPARK-13198:
---

I don't see evidence of a problem here yet. Stuff stays on heap until it is 
GCed. Are you sure you triggered one like with a profiler and then measured?

You also generally would never stop and start a context in an app

> sc.stop() does not clean up on driver, causes Java heap OOM.
> 
>
> Key: SPARK-13198
> URL: https://issues.apache.org/jira/browse/SPARK-13198
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Herman Schistad
> Attachments: Screen Shot 2016-02-04 at 16.31.28.png, Screen Shot 
> 2016-02-04 at 16.31.40.png, Screen Shot 2016-02-04 at 16.31.51.png
>
>
> When starting and stopping multiple SparkContext's linearly eventually the 
> driver stops working with a "io.netty.handler.codec.EncoderException: 
> java.lang.OutOfMemoryError: Java heap space" error.
> Reproduce by running the following code and loading in ~7MB parquet data each 
> time. The driver heap space is not changed and thus defaults to 1GB:
> {code:java}
> def main(args: Array[String]) {
>   val conf = new SparkConf().setMaster("MASTER_URL").setAppName("")
>   conf.set("spark.mesos.coarse", "true")
>   conf.set("spark.cores.max", "10")
>   for (i <- 1 until 100) {
> val sc = new SparkContext(conf)
> val sqlContext = new SQLContext(sc)
> val events = sqlContext.read.parquet("hdfs://locahost/tmp/something")
> println(s"Context ($i), number of events: " + events.count)
> sc.stop()
>   }
> }
> {code}
> The heap space fills up within 20 loops on my cluster. Increasing the number 
> of cores to 50 in the above example results in heap space error after 12 
> contexts.
> Dumping the heap reveals many equally sized "CoarseMesosSchedulerBackend" 
> objects (see attachments). Digging into the inner objects tells me that the 
> `executorDataMap` is where 99% of the data in said object is stored. I do 
> believe though that this is beside the point as I'd expect this whole object 
> to be garbage collected or freed on sc.stop(). 
> Additionally I can see in the Spark web UI that each time a new context is 
> created the number of the "SQL" tab increments by one (i.e. last iteration 
> would have SQL99). After doing stop and creating a completely new context I 
> was expecting this number to be reset to 1 ("SQL").
> I'm submitting the jar file with `spark-submit` and no special flags. The 
> cluster is running Mesos 0.23. I'm running Spark 1.6.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org