how garbage collection works on parallelize

2016-01-08 Thread jluan
Hi,

I am curious about garbage collect on an object which gets parallelized. Say
if we have a really large array (say 40GB in ram) that we want to
parallelize across our machines. 

I have the following function:

def doSomething(): RDD[Double] = {
val reallyBigArray = Array[Double[(some really big value)
sc.parallelize(reallyBigArray)
}

Theoretically, will reallyBigArray be marked for GC? Or will reallyBigArray
not be GC'd because parallelize somehow has a reference on reallyBigArray?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/how-garbage-collection-works-on-parallelize-tp25926.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: how garbage collection works on parallelize

2016-01-08 Thread Josh Rosen
It won't be GC'd as long as the RDD which results from `parallelize()` is
kept around; that RDD keeps strong references to the parallelized
collection's elements in order to enable fault-tolerance.

On Fri, Jan 8, 2016 at 6:50 PM, jluan <jaylu...@gmail.com> wrote:

> Hi,
>
> I am curious about garbage collect on an object which gets parallelized.
> Say
> if we have a really large array (say 40GB in ram) that we want to
> parallelize across our machines.
>
> I have the following function:
>
> def doSomething(): RDD[Double] = {
> val reallyBigArray = Array[Double[(some really big value)
> sc.parallelize(reallyBigArray)
> }
>
> Theoretically, will reallyBigArray be marked for GC? Or will reallyBigArray
> not be GC'd because parallelize somehow has a reference on reallyBigArray?
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/how-garbage-collection-works-on-parallelize-tp25926.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>