say I have 2 RDDs, RDD1 and RDD2. both are 20g in memory.
and I cache both of them in memory using RDD1.cache() and RDD2.cache() the in the further steps on my app, I never use RDD1 but use RDD2 for lots of time. then here is my question: if there is only 40G memory in my cluster, and here I have another RDD, RDD3 for 20g, what happened if I cache RDD3 using RDD3.cache()? as the document says, cache using the default cache level : MEMORY_ONLY . it means that it will not definitely cache RDD3 but re-compute it every time used. I feel a little confused, will spark help me remove RDD1 and put RDD3 in the memory? or is there any concept like " Priority cache " in spark? great thanks -- *--------------------------------------* a spark lover, a quant, a developer and a good man. http://github.com/litaotao