[ 
https://issues.apache.org/jira/browse/SPARK-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-4094.
------------------------------
    Resolution: Won't Fix

> checkpoint should still be available after rdd actions
> ------------------------------------------------------
>
>                 Key: SPARK-4094
>                 URL: https://issues.apache.org/jira/browse/SPARK-4094
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.1.0
>            Reporter: Zhang, Liye
>            Assignee: Zhang, Liye
>
> rdd.checkpoint() must be called before any actions on this rdd, if there is 
> any other actions before, checkpoint would never succeed. For the following 
> code as example:
> *rdd = sc.makeRDD(...)*
> *rdd.collect()*
> *rdd.checkpoint()*
> *rdd.count()*
> This rdd would never be checkpointed. For algorithms that have many 
> iterations would have some problem. Such as graph algorithm, there will have 
> many iterations which will cause the RDD lineage very long. So RDD may need 
> checkpoint after a certain iteration number. And if there is also any action 
> within the iteration loop, the checkpoint() operation will never work for the 
> later iterations after the iteration which calls the action operation.
> But this would not happen for RDD cache. RDD cache would always make 
> successfully before rdd actions no matter whether there is any actions before 
> cache().
> So rdd.checkpoint() should also be with the same behavior with rdd.cache().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to