Hi all,
I have a scenario of a web application submitting multiple jobs to Spark.
These jobs may be operating on the same RDD.

It is possible to cache() the RDD during one call...
And all subsequent calls can use the cached RDD?

basically, during one invocation
   val rdd1 = sparkContext1.textFile( file1).cache ()

another invocation..
    val rdd2 = sparkContext2.textFile(file1).cache()

(note that spark context are different, but the file is the same)

will the same file be loaded again in another spark context?
or there will be only one cached copy (since RDDs are immutable)

thanks!
Sujee Maniyam (http://sujee.net | http://www.linkedin.com/in/sujeemaniyam )

Reply via email to