Yes, Spark automatically removes old RDDs from the cache when you make new ones. Unpersist forces it to remove them right away. In both cases though, note that Java doesn’t garbage-collect the objects released until later.
Matei On Mar 19, 2014, at 7:22 PM, Nicholas Chammas <nicholas.cham...@gmail.com> wrote: > Related question: > > If I keep creating new RDDs and cache()-ing them, does Spark automatically > unpersist the least recently used RDD when it runs out of memory? Or is an > explicit unpersist the only way to get rid of an RDD (barring the PR > Tathagata mentioned)? > > Also, does unpersist()-ing an RDD immediately free up space, or just allow > that space to be reclaimed when needed? > > > On Wed, Mar 19, 2014 at 7:01 PM, Tathagata Das <tathagata.das1...@gmail.com> > wrote: > Just a head's up, there is an active pull requeust that will automatically > unpersist RDDs that are not in reference/scope from the application any more. > > TD > > > On Wed, Mar 19, 2014 at 6:58 PM, hequn cheng <chenghe...@gmail.com> wrote: > persist and unpersist. > unpersist:Mark the RDD as non-persistent, and remove all blocks for it from > memory and disk > > > 2014-03-19 16:40 GMT+08:00 林武康 <vboylin1...@gmail.com>: > > Hi, can any one tell me about the lifecycle of an rdd? I search through the > official website and still can't figure it out. Can I use an rdd in some > stages and destroy it in order to release memory because that no stages ahead > will use this rdd any more. Is it possible? > > Thanks! > > Sincerely > Lin wukang > > >