persisting RDD in memory

Sujee Maniyam Fri, 01 Aug 2014 11:00:02 -0700

Hi all,
I have a scenario of a web application submitting multiple jobs to Spark.
These jobs may be operating on the same RDD.


It is possible to cache() the RDD during one call...
And all subsequent calls can use the cached RDD?

basically, during one invocation
   val rdd1 = sparkContext1.textFile( file1).cache ()

another invocation..
    val rdd2 = sparkContext2.textFile(file1).cache()

(note that spark context are different, but the file is the same)

will the same file be loaded again in another spark context?
or there will be only one cached copy (since RDDs are immutable)

thanks!
Sujee Maniyam (http://sujee.net | http://www.linkedin.com/in/sujeemaniyam )

persisting RDD in memory

Reply via email to