Just load it as from any other directory.
> On 26. May 2017, at 17:26, Priya PM <pmpr...@gmail.com> wrote: > > > ---------- Forwarded message ---------- > From: Priya PM <pmpr...@gmail.com> > Date: Fri, May 26, 2017 at 8:54 PM > Subject: Re: Spark checkpoint - nonstreaming > To: Jörn Franke <jornfra...@gmail.com> > > > Oh, how do i do it. I dont see it mentioned anywhere in the documentation. > > I have followed this link > https://github.com/JerryLead/SparkInternals/blob/master/markdown/english/6-CacheAndCheckpoint.md > to understand checkpoint work flow. > > But it doesnt seem to work the way it was mentioned below during the second > run to read from checkpointed RDD. > > > > Q: How to read checkpointed RDD ? > > runJob() will call finalRDD.partitions() to determine how many tasks there > will be. rdd.partitions() checks if the RDD has been checkpointed via > RDDCheckpointData which manages checkpointed RDD. If yes, return the > partitions of the RDD (Array[Partition]). When rdd.iterator() is called to > compute RDD's partition, computeOrReadCheckpoint(split: Partition) is also > called to check if the RDD is checkpointed. If yes, the parent RDD's > iterator(), a.k.a CheckpointRDD.iterator() will be called. CheckpointRDD > reads files on file system to produce RDD partition. That's why a parent > CheckpointRDD is added to checkpointed rdd trickly > > >> On Fri, May 26, 2017 at 8:48 PM, Jörn Franke <jornfra...@gmail.com> wrote: >> Did you explicitly tell the application to read from the checkpoint >> directory ? >> This you have to do in non-streaming scenarios. >> >>> On 26. May 2017, at 16:52, Priya PM <pmpr...@gmail.com> wrote: >>> >>> yes, i did set the checkpoint directory. I could see the checkpointed RDD >>> too. >>> >>> [root@ rdd-28]# pwd >>> /root/checkpointDir/9dd1acf0-bef8-4a4f-bf0e-f7624334abc5/rdd-28 >>> >>> I am using the MovieLens application to check spark checkpointing feature. >>> >>> code: MovieLensALS.scala >>> >>> def main(args: Array[String]) { >>> .. >>> .. >>> sc.setCheckpointDir("/root/checkpointDir") >>> } >>> >>> >>> >>>> On Fri, May 26, 2017 at 8:09 PM, Jörn Franke <jornfra...@gmail.com> wrote: >>>> Do you have some source code? >>>> Did you set the checkpoint directory ? >>>> >>>> > On 26. May 2017, at 16:06, Priya <pmpr...@gmail.com> wrote: >>>> > >>>> > Hi, >>>> > >>>> > With nonstreaming spark application, did checkpoint the RDD and I could >>>> > see >>>> > the RDD getting checkpointed. I have killed the application after >>>> > checkpointing the RDD and restarted the same application again >>>> > immediately, >>>> > but it doesn't seem to pick from checkpoint and it again checkpoints the >>>> > RDD. Could anyone please explain why am I seeing this behavior, why it is >>>> > not picking from the checkpoint and proceeding further from there on the >>>> > second run of the same application. Would really help me understand spark >>>> > checkpoint work flow if I can get some clarity on the behavior. Please >>>> > let >>>> > me know if I am missing something. >>>> > >>>> > [root@checkpointDir]# ls >>>> > 9dd1acf0-bef8-4a4f-bf0e-f7624334abc5 >>>> > a4f14f43-e7c3-4f64-a980-8483b42bb11d >>>> > >>>> > [root@9dd1acf0-bef8-4a4f-bf0e-f7624334abc5]# ls -la >>>> > total 0 >>>> > drwxr-xr-x. 3 root root 20 May 26 16:26 . >>>> > drwxr-xr-x. 4 root root 94 May 26 16:24 .. >>>> > drwxr-xr-x. 2 root root 133 May 26 16:26 rdd-28 >>>> > >>>> > [root@priya-vm 9dd1acf0-bef8-4a4f-bf0e-f7624334abc5]# cd rdd-28/ >>>> > [root@priya-vm rdd-28]# ls >>>> > part-00000 part-00001 _partitioner >>>> > >>>> > Thanks >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > -- >>>> > View this message in context: >>>> > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-checkpoint-nonstreaming-tp28712.html >>>> > Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>> > >>>> > --------------------------------------------------------------------- >>>> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>>> > >>> > >