Re: Spark checkpoint - nonstreaming

Jörn Franke Fri, 26 May 2017 08:33:04 -0700

Just load it as from any other directory.


> On 26. May 2017, at 17:26, Priya PM <pmpr...@gmail.com> wrote:
> 
> 
> ---------- Forwarded message ----------
> From: Priya PM <pmpr...@gmail.com>
> Date: Fri, May 26, 2017 at 8:54 PM
> Subject: Re: Spark checkpoint - nonstreaming
> To: Jörn Franke <jornfra...@gmail.com>
> 
> 
> Oh, how do i do it. I dont see it mentioned anywhere in the documentation. 
> 
> I have followed this link 
> https://github.com/JerryLead/SparkInternals/blob/master/markdown/english/6-CacheAndCheckpoint.md
>  to understand checkpoint work flow. 
> 
> But it doesnt seem to work the way it was mentioned below during the second 
> run to read from checkpointed RDD. 
> 
>  
> 
> Q: How to read checkpointed RDD ?
> 
> runJob() will call finalRDD.partitions() to determine how many tasks there 
> will be. rdd.partitions() checks if the RDD has been checkpointed via 
> RDDCheckpointData which manages checkpointed RDD. If yes, return the 
> partitions of the RDD (Array[Partition]). When rdd.iterator() is called to 
> compute RDD's partition, computeOrReadCheckpoint(split: Partition) is also 
> called to check if the RDD is checkpointed. If yes, the parent RDD's 
> iterator(), a.k.a CheckpointRDD.iterator() will be called. CheckpointRDD 
> reads files on file system to produce RDD partition. That's why a parent 
> CheckpointRDD is added to checkpointed rdd trickly
> 
> 
>> On Fri, May 26, 2017 at 8:48 PM, Jörn Franke <jornfra...@gmail.com> wrote:
>> Did you explicitly tell the application to read from the checkpoint 
>> directory ?
>> This you have to do in non-streaming scenarios.
>> 
>>> On 26. May 2017, at 16:52, Priya PM <pmpr...@gmail.com> wrote:
>>> 
>>> yes, i did set the checkpoint directory. I could see the checkpointed RDD 
>>> too. 
>>> 
>>> [root@ rdd-28]# pwd
>>> /root/checkpointDir/9dd1acf0-bef8-4a4f-bf0e-f7624334abc5/rdd-28
>>> 
>>> I am using the MovieLens application to check spark checkpointing feature. 
>>> 
>>> code: MovieLensALS.scala
>>> 
>>> def main(args: Array[String]) { 
>>> ..
>>> ..
>>> sc.setCheckpointDir("/root/checkpointDir")
>>> }
>>> 
>>> 
>>> 
>>>> On Fri, May 26, 2017 at 8:09 PM, Jörn Franke <jornfra...@gmail.com> wrote:
>>>> Do you have some source code?
>>>> Did you set the checkpoint directory ?
>>>> 
>>>> > On 26. May 2017, at 16:06, Priya <pmpr...@gmail.com> wrote:
>>>> >
>>>> > Hi,
>>>> >
>>>> > With nonstreaming spark application, did checkpoint the RDD and I could 
>>>> > see
>>>> > the RDD getting checkpointed. I have killed the application after
>>>> > checkpointing the RDD and restarted the same application again 
>>>> > immediately,
>>>> > but it doesn't seem to pick from checkpoint and it again checkpoints the
>>>> > RDD. Could anyone please explain why am I seeing this behavior, why it is
>>>> > not picking from the checkpoint and proceeding further from there on the
>>>> > second run of the same application. Would really help me understand spark
>>>> > checkpoint work flow if I can get some clarity on the behavior. Please 
>>>> > let
>>>> > me know if I am missing something.
>>>> >
>>>> > [root@checkpointDir]# ls
>>>> > 9dd1acf0-bef8-4a4f-bf0e-f7624334abc5  
>>>> > a4f14f43-e7c3-4f64-a980-8483b42bb11d
>>>> >
>>>> > [root@9dd1acf0-bef8-4a4f-bf0e-f7624334abc5]# ls -la
>>>> > total 0
>>>> > drwxr-xr-x. 3 root root  20 May 26 16:26 .
>>>> > drwxr-xr-x. 4 root root  94 May 26 16:24 ..
>>>> > drwxr-xr-x. 2 root root 133 May 26 16:26 rdd-28
>>>> >
>>>> > [root@priya-vm 9dd1acf0-bef8-4a4f-bf0e-f7624334abc5]# cd rdd-28/
>>>> > [root@priya-vm rdd-28]# ls
>>>> > part-00000  part-00001  _partitioner
>>>> >
>>>> > Thanks
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > View this message in context: 
>>>> > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-checkpoint-nonstreaming-tp28712.html
>>>> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>> >
>>>> > ---------------------------------------------------------------------
>>>> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>> >
>>> 
> 
>

Re: Spark checkpoint - nonstreaming

Reply via email to