[ 
https://issues.apache.org/jira/browse/SPARK-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143342#comment-14143342
 ] 

Sean Owen commented on SPARK-3625:
----------------------------------

It still prints 1000 both times, which is correct. Your assertion is about 
something different. The assertion fails, but, the behavior you are asserting 
is not what the javadoc suggests:

{quote}
Mark this RDD for checkpointing. It will be saved to a file inside the 
checkpoint
directory set with SparkContext.setCheckpointDir() and all references to its 
parent
RDDs will be removed. This function must be called before any job has been
executed on this RDD. It is strongly recommended that this RDD is persisted in
memory, otherwise saving it on a file will require recomputation.
{quote}

This example calls count() before checkpoint(). If you don't, I think you get 
the expected behavior, since the dependency becomes a CheckpointRDD. This looks 
like not a bug.

> In some cases, the RDD.checkpoint does not work
> -----------------------------------------------
>
>                 Key: SPARK-3625
>                 URL: https://issues.apache.org/jira/browse/SPARK-3625
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.0.2, 1.1.0
>            Reporter: Guoqiang Li
>            Assignee: Guoqiang Li
>            Priority: Blocker
>
> The reproduce code:
> {code}
>     sc.setCheckpointDir(checkpointDir)
>     val c = sc.parallelize((1 to 1000)).map(_ + 1)
>     c.count
>     val dep = c.dependencies.head.rdd
>     c.checkpoint()
>     c.count
>     assert(dep != c.dependencies.head.rdd)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to