[ 
https://issues.apache.org/jira/browse/SPARK-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186418#comment-14186418
 ] 

Jie Huang commented on SPARK-4094:
----------------------------------

Yes. we found the similar issue also. According to the document, it can support 
checkpoint only before the action. But the problem here is, if you have a 
lineage like below.
{noformat}
A-- B--C(action)
    |--D(action)
{noformat}
If submit C action, then checkpoint B before action D, like
*C*
*B.checkpoint*
*D*

You cannot checkpoint that RDD(B). It doesn't align with the document and its 
original design.

> checkpoint should still be available after rdd actions
> ------------------------------------------------------
>
>                 Key: SPARK-4094
>                 URL: https://issues.apache.org/jira/browse/SPARK-4094
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.1.0
>            Reporter: Zhang, Liye
>
> rdd.checkpoint() must be called before any actions on this rdd, if there is 
> any other actions before, checkpoint would never succeed. For the following 
> code as example:
> *rdd = sc.makeRDD(...)*
> *rdd.collect()*
> *rdd.checkpoint()*
> *rdd.count()*
> This rdd would never be checkpointed. But this would not happen for RDD 
> cache. RDD cache would always make successfully before rdd actions no matter 
> whether there is any actions before cache().
> So rdd.checkpoint() should also be with the same behavior with rdd.cache().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to