I'm not sure that "checkpointed" means the same thing in that sentence.
You can run a simple test using `spark-shell`: sc.setCheckpointDir("/tmp/checkpoint") val rdd = sc.parallelize(1 to 10).map(x => { Thread.sleep(1000) x }) rdd.checkpoint() rdd.foreach(println) // Will take 10 seconds rdd.foreach(println) // Will be instant, because the RDD is checkpointed On Wed, Aug 2, 2017 at 7:05 PM, jeff saremi <jeffsar...@hotmail.com> wrote: > Vadim: > > This is from the Mastering Spark book: > > *"It is strongly recommended that a checkpointed RDD is persisted in > memory, otherwise saving it on a file will require recomputation."* > > > To me that means checkpoint will not prevent the recomputation that i was > hoping for > ------------------------------ > *From:* Vadim Semenov <vadim.seme...@datadoghq.com> > *Sent:* Tuesday, August 1, 2017 12:05:17 PM > *To:* jeff saremi > *Cc:* user@spark.apache.org > *Subject:* Re: How can i remove the need for calling cache > > You can use `.checkpoint()`: > ``` > val sc: SparkContext > sc.setCheckpointDir("hdfs:///tmp/checkpointDirectory") > myrdd.checkpoint() > val result1 = myrdd.map(op1(_)) > result1.count() // Will save `myrdd` to HDFS and do map(op1… > val result2 = myrdd.map(op2(_)) > result2.count() // Will load `myrdd` from HDFS and do map(op2… > ``` > > On Tue, Aug 1, 2017 at 2:05 PM, jeff saremi <jeffsar...@hotmail.com> > wrote: > >> Calling cache/persist fails all our jobs (i have posted 2 threads on >> this). >> >> And we're giving up hope in finding a solution. >> So I'd like to find a workaround for that: >> >> If I save an RDD to hdfs and read it back, can I use it in more than one >> operation? >> >> Example: (using cache) >> // do a whole bunch of transformations on an RDD >> >> myrdd.cache() >> >> val result1 = myrdd.map(op1(_)) >> >> val result2 = myrdd.map(op2(_)) >> >> // in the above I am assuming that a call to cache will prevent all >> previous transformation from being calculated twice >> >> I'd like to somehow get result1 and result2 without duplicating work. How >> can I do that? >> >> thanks >> >> Jeff >> > >