[jira] [Commented] (TOREE-374) Variables declared on the Notebook are not garbage collected

2017-02-07 Thread Marius Van Niekerk (JIRA)

[ 
https://issues.apache.org/jira/browse/TOREE-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856905#comment-15856905
 ] 

Marius Van Niekerk commented on TOREE-374:
--

As for actual fixes for the cell repl classes,  I dont really think that is 
possible.  Its certainly out of scope for Apache Toree.  Basically i tend to 
run Toree on some beefier edge nodes, so consequently i allocate at least 8g of 
memory per driver process.  

> Variables declared on the Notebook are not garbage collected
> 
>
> Key: TOREE-374
> URL: https://issues.apache.org/jira/browse/TOREE-374
> Project: TOREE
>  Issue Type: Bug
>Affects Versions: 0.1.0
>Reporter: David Taieb
>
> I'm not sure if it's a bug or a limitation of the underlying scala REPL.
> As part of supporting PixieDust (https://github.com/ibm-cds-labs/pixiedust) 
> auto-visualization feature within Scala gateway, I have implemented a weak 
> hashmap that tracks objects declared on the Scala REPL. However, I have found 
> that objects are not correctly gc'ed when the object is declared in a cell 
> with a val or var keyword and then the cell is ran again. One would expect 
> that the original object has no more references and should be gc'ed but it's 
> not. 
> However, when the object is declare with var keyword and then set to null in 
> another cell, then it is correctly gc'ed.
> I'm concerned that users who run the same cell multiple times would 
> unwittingly have memory leaks which can eventually lead to OOM errors.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TOREE-374) Variables declared on the Notebook are not garbage collected

2017-02-07 Thread Marius Van Niekerk (JIRA)

[ 
https://issues.apache.org/jira/browse/TOREE-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856898#comment-15856898
 ] 

Marius Van Niekerk commented on TOREE-374:
--

So zeppelin has a partial workaround for the valueOfTerm style thing in this

https://github.com/apache/zeppelin/blob/master/spark/src/main/java/org/apache/zeppelin/spark/SparkInterpreter.java#L1134



> Variables declared on the Notebook are not garbage collected
> 
>
> Key: TOREE-374
> URL: https://issues.apache.org/jira/browse/TOREE-374
> Project: TOREE
>  Issue Type: Bug
>Affects Versions: 0.1.0
>Reporter: David Taieb
>
> I'm not sure if it's a bug or a limitation of the underlying scala REPL.
> As part of supporting PixieDust (https://github.com/ibm-cds-labs/pixiedust) 
> auto-visualization feature within Scala gateway, I have implemented a weak 
> hashmap that tracks objects declared on the Scala REPL. However, I have found 
> that objects are not correctly gc'ed when the object is declared in a cell 
> with a val or var keyword and then the cell is ran again. One would expect 
> that the original object has no more references and should be gc'ed but it's 
> not. 
> However, when the object is declare with var keyword and then set to null in 
> another cell, then it is correctly gc'ed.
> I'm concerned that users who run the same cell multiple times would 
> unwittingly have memory leaks which can eventually lead to OOM errors.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TOREE-374) Variables declared on the Notebook are not garbage collected

2017-02-07 Thread David Taieb (JIRA)

[ 
https://issues.apache.org/jira/browse/TOREE-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856266#comment-15856266
 ] 

David Taieb commented on TOREE-374:
---

[~lbustelo] Totally understand the technical limitation, but looking from the 
perspective of the user, it looks like a bug. At the very least we should 
document best practices workaround 

Also, looking at the results from // show, I see this

import $line22$read.$iw.$iw.$iw.$iw.$iw.$iw.x;
class $iw extends Serializable
 def () = {
  super.;
  ()
  };
  val res4 = println(x)
};

Wonder how $line22$read.$iw.$iw.$iw.$iw.$iw.$iw.x; is created and whether we 
have an opportunity to clean it up within a pre_run_cell event?

> Variables declared on the Notebook are not garbage collected
> 
>
> Key: TOREE-374
> URL: https://issues.apache.org/jira/browse/TOREE-374
> Project: TOREE
>  Issue Type: Bug
>Affects Versions: 0.1.0
>Reporter: David Taieb
>
> I'm not sure if it's a bug or a limitation of the underlying scala REPL.
> As part of supporting PixieDust (https://github.com/ibm-cds-labs/pixiedust) 
> auto-visualization feature within Scala gateway, I have implemented a weak 
> hashmap that tracks objects declared on the Scala REPL. However, I have found 
> that objects are not correctly gc'ed when the object is declared in a cell 
> with a val or var keyword and then the cell is ran again. One would expect 
> that the original object has no more references and should be gc'ed but it's 
> not. 
> However, when the object is declare with var keyword and then set to null in 
> another cell, then it is correctly gc'ed.
> I'm concerned that users who run the same cell multiple times would 
> unwittingly have memory leaks which can eventually lead to OOM errors.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TOREE-374) Variables declared on the Notebook are not garbage collected

2017-02-07 Thread Chip Senkbeil (JIRA)

[ 
https://issues.apache.org/jira/browse/TOREE-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856197#comment-15856197
 ] 

Chip Senkbeil commented on TOREE-374:
-

Here's what I used to do. It still works with our 0.1.x branch. Assuming it'll 
work on master using Scala 2.11's REPL implementation.

{code}
val x = 3
println(x) // show
{code}

Just tack on a {code}// show{code} at the end of your code, the space between 
the forward slash and show being required.

> Variables declared on the Notebook are not garbage collected
> 
>
> Key: TOREE-374
> URL: https://issues.apache.org/jira/browse/TOREE-374
> Project: TOREE
>  Issue Type: Bug
>Affects Versions: 0.1.0
>Reporter: David Taieb
>
> I'm not sure if it's a bug or a limitation of the underlying scala REPL.
> As part of supporting PixieDust (https://github.com/ibm-cds-labs/pixiedust) 
> auto-visualization feature within Scala gateway, I have implemented a weak 
> hashmap that tracks objects declared on the Scala REPL. However, I have found 
> that objects are not correctly gc'ed when the object is declared in a cell 
> with a val or var keyword and then the cell is ran again. One would expect 
> that the original object has no more references and should be gc'ed but it's 
> not. 
> However, when the object is declare with var keyword and then set to null in 
> another cell, then it is correctly gc'ed.
> I'm concerned that users who run the same cell multiple times would 
> unwittingly have memory leaks which can eventually lead to OOM errors.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TOREE-374) Variables declared on the Notebook are not garbage collected

2017-02-07 Thread Gino Bustelo (JIRA)

[ 
https://issues.apache.org/jira/browse/TOREE-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856179#comment-15856179
 ] 

Gino Bustelo commented on TOREE-374:


[~chipsenkbeil] can you provide here the trick to visualize the repl's 
representation of the code?

> Variables declared on the Notebook are not garbage collected
> 
>
> Key: TOREE-374
> URL: https://issues.apache.org/jira/browse/TOREE-374
> Project: TOREE
>  Issue Type: Bug
>Affects Versions: 0.1.0
>Reporter: David Taieb
>
> I'm not sure if it's a bug or a limitation of the underlying scala REPL.
> As part of supporting PixieDust (https://github.com/ibm-cds-labs/pixiedust) 
> auto-visualization feature within Scala gateway, I have implemented a weak 
> hashmap that tracks objects declared on the Scala REPL. However, I have found 
> that objects are not correctly gc'ed when the object is declared in a cell 
> with a val or var keyword and then the cell is ran again. One would expect 
> that the original object has no more references and should be gc'ed but it's 
> not. 
> However, when the object is declare with var keyword and then set to null in 
> another cell, then it is correctly gc'ed.
> I'm concerned that users who run the same cell multiple times would 
> unwittingly have memory leaks which can eventually lead to OOM errors.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TOREE-374) Variables declared on the Notebook are not garbage collected

2017-02-07 Thread Gino Bustelo (JIRA)

[ 
https://issues.apache.org/jira/browse/TOREE-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856177#comment-15856177
 ] 

Gino Bustelo commented on TOREE-374:


I'm not a JVM expert, but [~dtaieb], you need to consider that you are not in a 
normal execution environment. Those steps are not taking place in a single 
class... each time you execute a cell, there is a class that is generated that 
imports the previous classes from a previous cell. You basically get a 
hierarchy of nested classes. There might be side effects to that.

[~jodersky] brings a good point. Try that set of calls in the scala repl and 
lets compare.

> Variables declared on the Notebook are not garbage collected
> 
>
> Key: TOREE-374
> URL: https://issues.apache.org/jira/browse/TOREE-374
> Project: TOREE
>  Issue Type: Bug
>Affects Versions: 0.1.0
>Reporter: David Taieb
>
> I'm not sure if it's a bug or a limitation of the underlying scala REPL.
> As part of supporting PixieDust (https://github.com/ibm-cds-labs/pixiedust) 
> auto-visualization feature within Scala gateway, I have implemented a weak 
> hashmap that tracks objects declared on the Scala REPL. However, I have found 
> that objects are not correctly gc'ed when the object is declared in a cell 
> with a val or var keyword and then the cell is ran again. One would expect 
> that the original object has no more references and should be gc'ed but it's 
> not. 
> However, when the object is declare with var keyword and then set to null in 
> another cell, then it is correctly gc'ed.
> I'm concerned that users who run the same cell multiple times would 
> unwittingly have memory leaks which can eventually lead to OOM errors.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TOREE-374) Variables declared on the Notebook are not garbage collected

2017-02-06 Thread David Taieb (JIRA)

[ 
https://issues.apache.org/jira/browse/TOREE-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855262#comment-15855262
 ] 

David Taieb commented on TOREE-374:
---

[~jodersky] Simple steps to reproduce.

In cell 1, create a ReferenceQueue with the following code, run cell 1 only 
once:
```
import scala.ref.WeakReference
import scala.ref.ReferenceQueue
val queue:ReferenceQueue[AnyRef] = new ReferenceQueue
```

In cell 2, create an obj and WeakReference to it
```
var obj = new Object()
val weakRef = new WeakReference(obj, queue)
```
Run cell 2 twice, the expected behaviour is that the first instance of obj 
should be marked for gc and placed in the ReferenceQueue

In cell 3, poll the ReferenceQueue:
```
System.gc()
println(queue.poll)
```
Run cell 3 and observe that it output None. No object has been marked for 
deletion.

Now the positive test, add obj = null in cell 3 as such (note: that's why I 
used var in the cell2, which means that val can never be gc'ed since you can't 
dereference them)
```
obj=null
System.gc()
println(queue.poll)
```
Output is: Some(scala.ref.WeakReferenceWithWrapper@25ac93fd) which is expected.

> Variables declared on the Notebook are not garbage collected
> 
>
> Key: TOREE-374
> URL: https://issues.apache.org/jira/browse/TOREE-374
> Project: TOREE
>  Issue Type: Bug
>Affects Versions: 0.1.0
>Reporter: David Taieb
>
> I'm not sure if it's a bug or a limitation of the underlying scala REPL.
> As part of supporting PixieDust (https://github.com/ibm-cds-labs/pixiedust) 
> auto-visualization feature within Scala gateway, I have implemented a weak 
> hashmap that tracks objects declared on the Scala REPL. However, I have found 
> that objects are not correctly gc'ed when the object is declared in a cell 
> with a val or var keyword and then the cell is ran again. One would expect 
> that the original object has no more references and should be gc'ed but it's 
> not. 
> However, when the object is declare with var keyword and then set to null in 
> another cell, then it is correctly gc'ed.
> I'm concerned that users who run the same cell multiple times would 
> unwittingly have memory leaks which can eventually lead to OOM errors.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TOREE-374) Variables declared on the Notebook are not garbage collected

2017-02-06 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/TOREE-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855081#comment-15855081
 ] 

Jakob Odersky commented on TOREE-374:
-

[~dtaieb] Could you provide some steps to reproduce this?

> Variables declared on the Notebook are not garbage collected
> 
>
> Key: TOREE-374
> URL: https://issues.apache.org/jira/browse/TOREE-374
> Project: TOREE
>  Issue Type: Bug
>Affects Versions: 0.1.0
>Reporter: David Taieb
>
> I'm not sure if it's a bug or a limitation of the underlying scala REPL.
> As part of supporting PixieDust (https://github.com/ibm-cds-labs/pixiedust) 
> auto-visualization feature within Scala gateway, I have implemented a weak 
> hashmap that tracks objects declared on the Scala REPL. However, I have found 
> that objects are not correctly gc'ed when the object is declared in a cell 
> with a val or var keyword and then the cell is ran again. One would expect 
> that the original object has no more references and should be gc'ed but it's 
> not. 
> However, when the object is declare with var keyword and then set to null in 
> another cell, then it is correctly gc'ed.
> I'm concerned that users who run the same cell multiple times would 
> unwittingly have memory leaks which can eventually lead to OOM errors.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TOREE-374) Variables declared on the Notebook are not garbage collected

2017-02-06 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/TOREE-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855055#comment-15855055
 ] 

Jakob Odersky commented on TOREE-374:
-

Hmm, I wonder if this is related to the Yrepl-class-based setting of the repl. 
This setting is a new/experimental feature of the scala interpreter but it is 
required for Spark to work.

[~mariusvniekerk] I don't know anything about the comm api, how would that work 
here?

> Variables declared on the Notebook are not garbage collected
> 
>
> Key: TOREE-374
> URL: https://issues.apache.org/jira/browse/TOREE-374
> Project: TOREE
>  Issue Type: Bug
>Affects Versions: 0.1.0
>Reporter: David Taieb
>
> I'm not sure if it's a bug or a limitation of the underlying scala REPL.
> As part of supporting PixieDust (https://github.com/ibm-cds-labs/pixiedust) 
> auto-visualization feature within Scala gateway, I have implemented a weak 
> hashmap that tracks objects declared on the Scala REPL. However, I have found 
> that objects are not correctly gc'ed when the object is declared in a cell 
> with a val or var keyword and then the cell is ran again. One would expect 
> that the original object has no more references and should be gc'ed but it's 
> not. 
> However, when the object is declare with var keyword and then set to null in 
> another cell, then it is correctly gc'ed.
> I'm concerned that users who run the same cell multiple times would 
> unwittingly have memory leaks which can eventually lead to OOM errors.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TOREE-374) Variables declared on the Notebook are not garbage collected

2017-02-06 Thread David Taieb (JIRA)

[ 
https://issues.apache.org/jira/browse/TOREE-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854852#comment-15854852
 ] 

David Taieb commented on TOREE-374:
---

[~mariusvniekerk] If there was a way to manually force the generate class to be 
gc'ed, we could use the pre_run_cell event to listen on cell being executed and 
force the variables to be de-referenced and gc'ed. Thought?

> Variables declared on the Notebook are not garbage collected
> 
>
> Key: TOREE-374
> URL: https://issues.apache.org/jira/browse/TOREE-374
> Project: TOREE
>  Issue Type: Bug
>Affects Versions: 0.1.0
>Reporter: David Taieb
>
> I'm not sure if it's a bug or a limitation of the underlying scala REPL.
> As part of supporting PixieDust (https://github.com/ibm-cds-labs/pixiedust) 
> auto-visualization feature within Scala gateway, I have implemented a weak 
> hashmap that tracks objects declared on the Scala REPL. However, I have found 
> that objects are not correctly gc'ed when the object is declared in a cell 
> with a val or var keyword and then the cell is ran again. One would expect 
> that the original object has no more references and should be gc'ed but it's 
> not. 
> However, when the object is declare with var keyword and then set to null in 
> another cell, then it is correctly gc'ed.
> I'm concerned that users who run the same cell multiple times would 
> unwittingly have memory leaks which can eventually lead to OOM errors.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)