rdd cache priority

charles li Thu, 04 Feb 2016 19:16:02 -0800

say I have 2 RDDs, RDD1 and RDD2.

both are 20g in memory.


and I cache both of them in memory using RDD1.cache() and RDD2.cache()


the in the further steps on my app, I never use RDD1 but use RDD2 for lots
of time.


then here is my question:

if there is only 40G memory in my cluster, and here I have another RDD,
RDD3 for 20g, what happened if I cache RDD3 using RDD3.cache()?


as the document says, cache using the default cache level : MEMORY_ONLY .
it means that it will not definitely cache RDD3 but re-compute it every
time used.

I feel a little confused, will spark help me remove RDD1 and put RDD3 in
the memory?

or is there any concept like " Priority cache " in spark?


great thanks



-- 
*--------------------------------------*
a spark lover, a quant, a developer and a good man.

http://github.com/litaotao

rdd cache priority

Reply via email to