Yes, otherwise you can try:
rdd.cache().count() and then run your benchmark Paolo Da: Daniel Darabos<mailto:daniel.dara...@lynxanalytics.com> Data invio: ?mercoled?? ?3? ?dicembre? ?2014 ?12?:?28 A: shahab<mailto:shahab.mok...@gmail.com> Cc: user@spark.apache.org<mailto:user@spark.apache.org> On Wed, Dec 3, 2014 at 10:52 AM, shahab <shahab.mok...@gmail.com<mailto:shahab.mok...@gmail.com>> wrote: Hi, I noticed that rdd.cache() is not happening immediately rather due to lazy feature of Spark, it is happening just at the moment you perform some map/reduce actions. Is this true? Yes, this is correct. If this is the case, how can I enforce Spark to cache immediately at its cache() statement? I need this to perform some benchmarking and I need to separate rdd caching and rdd transformation/action processing time. The typical solution I think is to run rdd.foreach(_ => ()) to trigger a calculation.