[jira] [Commented] (TOREE-374) Variables declared on the Notebook are not garbage collected
[ https://issues.apache.org/jira/browse/TOREE-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856905#comment-15856905 ] Marius Van Niekerk commented on TOREE-374: -- As for actual fixes for the cell repl classes, I dont really think that is possible. Its certainly out of scope for Apache Toree. Basically i tend to run Toree on some beefier edge nodes, so consequently i allocate at least 8g of memory per driver process. > Variables declared on the Notebook are not garbage collected > > > Key: TOREE-374 > URL: https://issues.apache.org/jira/browse/TOREE-374 > Project: TOREE > Issue Type: Bug >Affects Versions: 0.1.0 >Reporter: David Taieb > > I'm not sure if it's a bug or a limitation of the underlying scala REPL. > As part of supporting PixieDust (https://github.com/ibm-cds-labs/pixiedust) > auto-visualization feature within Scala gateway, I have implemented a weak > hashmap that tracks objects declared on the Scala REPL. However, I have found > that objects are not correctly gc'ed when the object is declared in a cell > with a val or var keyword and then the cell is ran again. One would expect > that the original object has no more references and should be gc'ed but it's > not. > However, when the object is declare with var keyword and then set to null in > another cell, then it is correctly gc'ed. > I'm concerned that users who run the same cell multiple times would > unwittingly have memory leaks which can eventually lead to OOM errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TOREE-374) Variables declared on the Notebook are not garbage collected
[ https://issues.apache.org/jira/browse/TOREE-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856898#comment-15856898 ] Marius Van Niekerk commented on TOREE-374: -- So zeppelin has a partial workaround for the valueOfTerm style thing in this https://github.com/apache/zeppelin/blob/master/spark/src/main/java/org/apache/zeppelin/spark/SparkInterpreter.java#L1134 > Variables declared on the Notebook are not garbage collected > > > Key: TOREE-374 > URL: https://issues.apache.org/jira/browse/TOREE-374 > Project: TOREE > Issue Type: Bug >Affects Versions: 0.1.0 >Reporter: David Taieb > > I'm not sure if it's a bug or a limitation of the underlying scala REPL. > As part of supporting PixieDust (https://github.com/ibm-cds-labs/pixiedust) > auto-visualization feature within Scala gateway, I have implemented a weak > hashmap that tracks objects declared on the Scala REPL. However, I have found > that objects are not correctly gc'ed when the object is declared in a cell > with a val or var keyword and then the cell is ran again. One would expect > that the original object has no more references and should be gc'ed but it's > not. > However, when the object is declare with var keyword and then set to null in > another cell, then it is correctly gc'ed. > I'm concerned that users who run the same cell multiple times would > unwittingly have memory leaks which can eventually lead to OOM errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TOREE-374) Variables declared on the Notebook are not garbage collected
[ https://issues.apache.org/jira/browse/TOREE-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856266#comment-15856266 ] David Taieb commented on TOREE-374: --- [~lbustelo] Totally understand the technical limitation, but looking from the perspective of the user, it looks like a bug. At the very least we should document best practices workaround Also, looking at the results from // show, I see this import $line22$read.$iw.$iw.$iw.$iw.$iw.$iw.x; class $iw extends Serializable def () = { super.; () }; val res4 = println(x) }; Wonder how $line22$read.$iw.$iw.$iw.$iw.$iw.$iw.x; is created and whether we have an opportunity to clean it up within a pre_run_cell event? > Variables declared on the Notebook are not garbage collected > > > Key: TOREE-374 > URL: https://issues.apache.org/jira/browse/TOREE-374 > Project: TOREE > Issue Type: Bug >Affects Versions: 0.1.0 >Reporter: David Taieb > > I'm not sure if it's a bug or a limitation of the underlying scala REPL. > As part of supporting PixieDust (https://github.com/ibm-cds-labs/pixiedust) > auto-visualization feature within Scala gateway, I have implemented a weak > hashmap that tracks objects declared on the Scala REPL. However, I have found > that objects are not correctly gc'ed when the object is declared in a cell > with a val or var keyword and then the cell is ran again. One would expect > that the original object has no more references and should be gc'ed but it's > not. > However, when the object is declare with var keyword and then set to null in > another cell, then it is correctly gc'ed. > I'm concerned that users who run the same cell multiple times would > unwittingly have memory leaks which can eventually lead to OOM errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TOREE-374) Variables declared on the Notebook are not garbage collected
[ https://issues.apache.org/jira/browse/TOREE-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856197#comment-15856197 ] Chip Senkbeil commented on TOREE-374: - Here's what I used to do. It still works with our 0.1.x branch. Assuming it'll work on master using Scala 2.11's REPL implementation. {code} val x = 3 println(x) // show {code} Just tack on a {code}// show{code} at the end of your code, the space between the forward slash and show being required. > Variables declared on the Notebook are not garbage collected > > > Key: TOREE-374 > URL: https://issues.apache.org/jira/browse/TOREE-374 > Project: TOREE > Issue Type: Bug >Affects Versions: 0.1.0 >Reporter: David Taieb > > I'm not sure if it's a bug or a limitation of the underlying scala REPL. > As part of supporting PixieDust (https://github.com/ibm-cds-labs/pixiedust) > auto-visualization feature within Scala gateway, I have implemented a weak > hashmap that tracks objects declared on the Scala REPL. However, I have found > that objects are not correctly gc'ed when the object is declared in a cell > with a val or var keyword and then the cell is ran again. One would expect > that the original object has no more references and should be gc'ed but it's > not. > However, when the object is declare with var keyword and then set to null in > another cell, then it is correctly gc'ed. > I'm concerned that users who run the same cell multiple times would > unwittingly have memory leaks which can eventually lead to OOM errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TOREE-374) Variables declared on the Notebook are not garbage collected
[ https://issues.apache.org/jira/browse/TOREE-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856179#comment-15856179 ] Gino Bustelo commented on TOREE-374: [~chipsenkbeil] can you provide here the trick to visualize the repl's representation of the code? > Variables declared on the Notebook are not garbage collected > > > Key: TOREE-374 > URL: https://issues.apache.org/jira/browse/TOREE-374 > Project: TOREE > Issue Type: Bug >Affects Versions: 0.1.0 >Reporter: David Taieb > > I'm not sure if it's a bug or a limitation of the underlying scala REPL. > As part of supporting PixieDust (https://github.com/ibm-cds-labs/pixiedust) > auto-visualization feature within Scala gateway, I have implemented a weak > hashmap that tracks objects declared on the Scala REPL. However, I have found > that objects are not correctly gc'ed when the object is declared in a cell > with a val or var keyword and then the cell is ran again. One would expect > that the original object has no more references and should be gc'ed but it's > not. > However, when the object is declare with var keyword and then set to null in > another cell, then it is correctly gc'ed. > I'm concerned that users who run the same cell multiple times would > unwittingly have memory leaks which can eventually lead to OOM errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TOREE-374) Variables declared on the Notebook are not garbage collected
[ https://issues.apache.org/jira/browse/TOREE-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856177#comment-15856177 ] Gino Bustelo commented on TOREE-374: I'm not a JVM expert, but [~dtaieb], you need to consider that you are not in a normal execution environment. Those steps are not taking place in a single class... each time you execute a cell, there is a class that is generated that imports the previous classes from a previous cell. You basically get a hierarchy of nested classes. There might be side effects to that. [~jodersky] brings a good point. Try that set of calls in the scala repl and lets compare. > Variables declared on the Notebook are not garbage collected > > > Key: TOREE-374 > URL: https://issues.apache.org/jira/browse/TOREE-374 > Project: TOREE > Issue Type: Bug >Affects Versions: 0.1.0 >Reporter: David Taieb > > I'm not sure if it's a bug or a limitation of the underlying scala REPL. > As part of supporting PixieDust (https://github.com/ibm-cds-labs/pixiedust) > auto-visualization feature within Scala gateway, I have implemented a weak > hashmap that tracks objects declared on the Scala REPL. However, I have found > that objects are not correctly gc'ed when the object is declared in a cell > with a val or var keyword and then the cell is ran again. One would expect > that the original object has no more references and should be gc'ed but it's > not. > However, when the object is declare with var keyword and then set to null in > another cell, then it is correctly gc'ed. > I'm concerned that users who run the same cell multiple times would > unwittingly have memory leaks which can eventually lead to OOM errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TOREE-374) Variables declared on the Notebook are not garbage collected
[ https://issues.apache.org/jira/browse/TOREE-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855262#comment-15855262 ] David Taieb commented on TOREE-374: --- [~jodersky] Simple steps to reproduce. In cell 1, create a ReferenceQueue with the following code, run cell 1 only once: ``` import scala.ref.WeakReference import scala.ref.ReferenceQueue val queue:ReferenceQueue[AnyRef] = new ReferenceQueue ``` In cell 2, create an obj and WeakReference to it ``` var obj = new Object() val weakRef = new WeakReference(obj, queue) ``` Run cell 2 twice, the expected behaviour is that the first instance of obj should be marked for gc and placed in the ReferenceQueue In cell 3, poll the ReferenceQueue: ``` System.gc() println(queue.poll) ``` Run cell 3 and observe that it output None. No object has been marked for deletion. Now the positive test, add obj = null in cell 3 as such (note: that's why I used var in the cell2, which means that val can never be gc'ed since you can't dereference them) ``` obj=null System.gc() println(queue.poll) ``` Output is: Some(scala.ref.WeakReferenceWithWrapper@25ac93fd) which is expected. > Variables declared on the Notebook are not garbage collected > > > Key: TOREE-374 > URL: https://issues.apache.org/jira/browse/TOREE-374 > Project: TOREE > Issue Type: Bug >Affects Versions: 0.1.0 >Reporter: David Taieb > > I'm not sure if it's a bug or a limitation of the underlying scala REPL. > As part of supporting PixieDust (https://github.com/ibm-cds-labs/pixiedust) > auto-visualization feature within Scala gateway, I have implemented a weak > hashmap that tracks objects declared on the Scala REPL. However, I have found > that objects are not correctly gc'ed when the object is declared in a cell > with a val or var keyword and then the cell is ran again. One would expect > that the original object has no more references and should be gc'ed but it's > not. > However, when the object is declare with var keyword and then set to null in > another cell, then it is correctly gc'ed. > I'm concerned that users who run the same cell multiple times would > unwittingly have memory leaks which can eventually lead to OOM errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TOREE-374) Variables declared on the Notebook are not garbage collected
[ https://issues.apache.org/jira/browse/TOREE-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855081#comment-15855081 ] Jakob Odersky commented on TOREE-374: - [~dtaieb] Could you provide some steps to reproduce this? > Variables declared on the Notebook are not garbage collected > > > Key: TOREE-374 > URL: https://issues.apache.org/jira/browse/TOREE-374 > Project: TOREE > Issue Type: Bug >Affects Versions: 0.1.0 >Reporter: David Taieb > > I'm not sure if it's a bug or a limitation of the underlying scala REPL. > As part of supporting PixieDust (https://github.com/ibm-cds-labs/pixiedust) > auto-visualization feature within Scala gateway, I have implemented a weak > hashmap that tracks objects declared on the Scala REPL. However, I have found > that objects are not correctly gc'ed when the object is declared in a cell > with a val or var keyword and then the cell is ran again. One would expect > that the original object has no more references and should be gc'ed but it's > not. > However, when the object is declare with var keyword and then set to null in > another cell, then it is correctly gc'ed. > I'm concerned that users who run the same cell multiple times would > unwittingly have memory leaks which can eventually lead to OOM errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TOREE-374) Variables declared on the Notebook are not garbage collected
[ https://issues.apache.org/jira/browse/TOREE-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855055#comment-15855055 ] Jakob Odersky commented on TOREE-374: - Hmm, I wonder if this is related to the Yrepl-class-based setting of the repl. This setting is a new/experimental feature of the scala interpreter but it is required for Spark to work. [~mariusvniekerk] I don't know anything about the comm api, how would that work here? > Variables declared on the Notebook are not garbage collected > > > Key: TOREE-374 > URL: https://issues.apache.org/jira/browse/TOREE-374 > Project: TOREE > Issue Type: Bug >Affects Versions: 0.1.0 >Reporter: David Taieb > > I'm not sure if it's a bug or a limitation of the underlying scala REPL. > As part of supporting PixieDust (https://github.com/ibm-cds-labs/pixiedust) > auto-visualization feature within Scala gateway, I have implemented a weak > hashmap that tracks objects declared on the Scala REPL. However, I have found > that objects are not correctly gc'ed when the object is declared in a cell > with a val or var keyword and then the cell is ran again. One would expect > that the original object has no more references and should be gc'ed but it's > not. > However, when the object is declare with var keyword and then set to null in > another cell, then it is correctly gc'ed. > I'm concerned that users who run the same cell multiple times would > unwittingly have memory leaks which can eventually lead to OOM errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TOREE-374) Variables declared on the Notebook are not garbage collected
[ https://issues.apache.org/jira/browse/TOREE-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854852#comment-15854852 ] David Taieb commented on TOREE-374: --- [~mariusvniekerk] If there was a way to manually force the generate class to be gc'ed, we could use the pre_run_cell event to listen on cell being executed and force the variables to be de-referenced and gc'ed. Thought? > Variables declared on the Notebook are not garbage collected > > > Key: TOREE-374 > URL: https://issues.apache.org/jira/browse/TOREE-374 > Project: TOREE > Issue Type: Bug >Affects Versions: 0.1.0 >Reporter: David Taieb > > I'm not sure if it's a bug or a limitation of the underlying scala REPL. > As part of supporting PixieDust (https://github.com/ibm-cds-labs/pixiedust) > auto-visualization feature within Scala gateway, I have implemented a weak > hashmap that tracks objects declared on the Scala REPL. However, I have found > that objects are not correctly gc'ed when the object is declared in a cell > with a val or var keyword and then the cell is ran again. One would expect > that the original object has no more references and should be gc'ed but it's > not. > However, when the object is declare with var keyword and then set to null in > another cell, then it is correctly gc'ed. > I'm concerned that users who run the same cell multiple times would > unwittingly have memory leaks which can eventually lead to OOM errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346)