[ https://issues.apache.org/jira/browse/SPARK-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143342#comment-14143342 ]
Sean Owen commented on SPARK-3625: ---------------------------------- It still prints 1000 both times, which is correct. Your assertion is about something different. The assertion fails, but, the behavior you are asserting is not what the javadoc suggests: {quote} Mark this RDD for checkpointing. It will be saved to a file inside the checkpoint directory set with SparkContext.setCheckpointDir() and all references to its parent RDDs will be removed. This function must be called before any job has been executed on this RDD. It is strongly recommended that this RDD is persisted in memory, otherwise saving it on a file will require recomputation. {quote} This example calls count() before checkpoint(). If you don't, I think you get the expected behavior, since the dependency becomes a CheckpointRDD. This looks like not a bug. > In some cases, the RDD.checkpoint does not work > ----------------------------------------------- > > Key: SPARK-3625 > URL: https://issues.apache.org/jira/browse/SPARK-3625 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.0.2, 1.1.0 > Reporter: Guoqiang Li > Assignee: Guoqiang Li > Priority: Blocker > > The reproduce code: > {code} > sc.setCheckpointDir(checkpointDir) > val c = sc.parallelize((1 to 1000)).map(_ + 1) > c.count > val dep = c.dependencies.head.rdd > c.checkpoint() > c.count > assert(dep != c.dependencies.head.rdd) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org