Re: what happened if cache a RDD for multiple time?

2016-03-24 Thread charles li
hi, yash, that's really help me, great thanks On Thu, Mar 24, 2016 at 7:07 PM, yash datta wrote: > Yes, That is correct. > > When you call cache on an RDD, internally it calls > persist(StorageLevel.MEMORY_ONLY) which further calls > > persist(StorageLevel.MEMORY_ONLY,

Re: what happened if cache a RDD for multiple time?

2016-03-24 Thread yash datta
Yes, That is correct. When you call cache on an RDD, internally it calls persist(StorageLevel.MEMORY_ONLY) which further calls persist(StorageLevel.MEMORY_ONLY, allowOverride=false) , if the RDD is not marked for localCheckpointing Below is what is finally triggered : /** * Mark this RDD for

what happened if cache a RDD for multiple time?

2016-03-24 Thread charles li
happened to see this problem on stackoverflow: http://stackoverflow.com/questions/36195105/what-happens-if-i-cache-the-same-rdd-twice-in-spark/36195812#36195812 I think it's very interesting, and I think the answer posted by Aaron sounds promising, but I'm not sure, and I don't find the details