It won't be GC'd as long as the RDD which results from `parallelize()` is kept around; that RDD keeps strong references to the parallelized collection's elements in order to enable fault-tolerance.
On Fri, Jan 8, 2016 at 6:50 PM, jluan <jaylu...@gmail.com> wrote: > Hi, > > I am curious about garbage collect on an object which gets parallelized. > Say > if we have a really large array (say 40GB in ram) that we want to > parallelize across our machines. > > I have the following function: > > def doSomething(): RDD[Double] = { > val reallyBigArray = Array[Double[(some really big value) > sc.parallelize(reallyBigArray) > } > > Theoretically, will reallyBigArray be marked for GC? Or will reallyBigArray > not be GC'd because parallelize somehow has a reference on reallyBigArray? > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/how-garbage-collection-works-on-parallelize-tp25926.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >