Just load it as from any other directory.
> On 26. May 2017, at 17:26, Priya PM <pmpr...@gmail.com> wrote:
>
>
> -- Forwarded message --
> From: Priya PM <pmpr...@gmail.com>
> Date: Fri, May 26, 2017 at 8:54 PM
> Subject: Re: Spark checkpoint - nonstreaming
> To: Jörn Franke <jornfra...@gmail.com>
>
>
> Oh, how do i do it. I dont see it mentioned anywhere in the documentation.
>
> I have followed this link
> https://github.com/JerryLead/SparkInternals/blob/master/markdown/english/6-CacheAndCheckpoint.md
> to understand checkpoint work flow.
>
> But it doesnt seem to work the way it was mentioned below during the second
> run to read from checkpointed RDD.
>
>
>
> Q: How to read checkpointed RDD ?
>
> runJob() will call finalRDD.partitions() to determine how many tasks there
> will be. rdd.partitions() checks if the RDD has been checkpointed via
> RDDCheckpointData which manages checkpointed RDD. If yes, return the
> partitions of the RDD (Array[Partition]). When rdd.iterator() is called to
> compute RDD's partition, computeOrReadCheckpoint(split: Partition) is also
> called to check if the RDD is checkpointed. If yes, the parent RDD's
> iterator(), a.k.a CheckpointRDD.iterator() will be called. CheckpointRDD
> reads files on file system to produce RDD partition. That's why a parent
> CheckpointRDD is added to checkpointed rdd trickly
>
>
>> On Fri, May 26, 2017 at 8:48 PM, Jörn Franke <jornfra...@gmail.com> wrote:
>> Did you explicitly tell the application to read from the checkpoint
>> directory ?
>> This you have to do in non-streaming scenarios.
>>
>>> On 26. May 2017, at 16:52, Priya PM <pmpr...@gmail.com> wrote:
>>>
>>> yes, i did set the checkpoint directory. I could see the checkpointed RDD
>>> too.
>>>
>>> [root@ rdd-28]# pwd
>>> /root/checkpointDir/9dd1acf0-bef8-4a4f-bf0e-f7624334abc5/rdd-28
>>>
>>> I am using the MovieLens application to check spark checkpointing feature.
>>>
>>> code: MovieLensALS.scala
>>>
>>> def main(args: Array[String]) {
>>> ..
>>> ..
>>> sc.setCheckpointDir("/root/checkpointDir")
>>> }
>>>
>>>
>>>
>>>> On Fri, May 26, 2017 at 8:09 PM, Jörn Franke <jornfra...@gmail.com> wrote:
>>>> Do you have some source code?
>>>> Did you set the checkpoint directory ?
>>>>
>>>> > On 26. May 2017, at 16:06, Priya <pmpr...@gmail.com> wrote:
>>>> >
>>>> > Hi,
>>>> >
>>>> > With nonstreaming spark application, did checkpoint the RDD and I could
>>>> > see
>>>> > the RDD getting checkpointed. I have killed the application after
>>>> > checkpointing the RDD and restarted the same application again
>>>> > immediately,
>>>> > but it doesn't seem to pick from checkpoint and it again checkpoints the
>>>> > RDD. Could anyone please explain why am I seeing this behavior, why it is
>>>> > not picking from the checkpoint and proceeding further from there on the
>>>> > second run of the same application. Would really help me understand spark
>>>> > checkpoint work flow if I can get some clarity on the behavior. Please
>>>> > let
>>>> > me know if I am missing something.
>>>> >
>>>> > [root@checkpointDir]# ls
>>>> > 9dd1acf0-bef8-4a4f-bf0e-f7624334abc5
>>>> > a4f14f43-e7c3-4f64-a980-8483b42bb11d
>>>> >
>>>> > [root@9dd1acf0-bef8-4a4f-bf0e-f7624334abc5]# ls -la
>>>> > total 0
>>>> > drwxr-xr-x. 3 root root 20 May 26 16:26 .
>>>> > drwxr-xr-x. 4 root root 94 May 26 16:24 ..
>>>> > drwxr-xr-x. 2 root root 133 May 26 16:26 rdd-28
>>>> >
>>>> > [root@priya-vm 9dd1acf0-bef8-4a4f-bf0e-f7624334abc5]# cd rdd-28/
>>>> > [root@priya-vm rdd-28]# ls
>>>> > part-0 part-1 _partitioner
>>>> >
>>>> > Thanks
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > View this message in context:
>>>> > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-checkpoint-nonstreaming-tp28712.html
>>>> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>> >
>>>> > -
>>>> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>> >
>>>
>
>