Yes,

otherwise you can try:

rdd.cache().count()

and then run your benchmark

Paolo

Da: Daniel Darabos<mailto:daniel.dara...@lynxanalytics.com>
Data invio: ?mercoled?? ?3? ?dicembre? ?2014 ?12?:?28
A: shahab<mailto:shahab.mok...@gmail.com>
Cc: user@spark.apache.org<mailto:user@spark.apache.org>



On Wed, Dec 3, 2014 at 10:52 AM, shahab 
<shahab.mok...@gmail.com<mailto:shahab.mok...@gmail.com>> wrote:
Hi,

I noticed that rdd.cache() is not happening immediately rather due to lazy 
feature of Spark, it is happening just at the moment  you perform some 
map/reduce actions. Is this true?

Yes, this is correct.

If this is the case, how can I enforce Spark to cache immediately at its 
cache() statement? I need this to perform some benchmarking and I need to 
separate rdd caching and rdd transformation/action processing time.

The typical solution I think is to run rdd.foreach(_ => ()) to trigger a 
calculation.

Reply via email to