Hi, u're right; rdd3 is not totally cached and it is re-computed every time. If MEMORY_AND_DISK, rdd3 is written to disk.
Also, the current Spark does not automatically unpersist rdds depends on frequency of use. On Fri, Feb 5, 2016 at 12:15 PM, charles li <charles.up...@gmail.com> wrote: > say I have 2 RDDs, RDD1 and RDD2. > > both are 20g in memory. > > and I cache both of them in memory using RDD1.cache() and RDD2.cache() > > > the in the further steps on my app, I never use RDD1 but use RDD2 for lots > of time. > > > then here is my question: > > if there is only 40G memory in my cluster, and here I have another RDD, > RDD3 for 20g, what happened if I cache RDD3 using RDD3.cache()? > > > as the document says, cache using the default cache level : MEMORY_ONLY . > it means that it will not definitely cache RDD3 but re-compute it every > time used. > > I feel a little confused, will spark help me remove RDD1 and put RDD3 in > the memory? > > or is there any concept like " Priority cache " in spark? > > > great thanks > > > > -- > *--------------------------------------* > a spark lover, a quant, a developer and a good man. > > http://github.com/litaotao > -- --- Takeshi Yamamuro