Hi, I have following code that uses checkpoint to checkpoint the heavy ops,which works well that the last heavyOpRDD.foreach(println) will not recompute from the beginning. But when I re-run this program, the rdd computing chain will be recomputed from the beginning, I thought that it will also read from the checkpoint directory since I have the data there when I last run it.
Do I misunderstand how checkpoint works or there are some configuration to make it work. Thanks import org.apache.spark.{SparkConf, SparkContext} object CheckpointTest { def squareWithHeavyOp(x: Int) = { Thread.sleep(2000) println(s"squareWithHeavyOp $x") x * x } def main(args: Array[String]) { val conf = new SparkConf().setMaster("local").setAppName("CheckpointTest") val sc = new SparkContext(conf) sc.setCheckpointDir("file:///d:/checkpointDir") val rdd = sc.parallelize(List(1, 2, 3, 4, 5)) val heavyOpRDD = rdd.map(squareWithHeavyOp) heavyOpRDD.checkpoint() heavyOpRDD.foreach(println) println("Job 0 has been finished, press ENTER to do job 1") readLine() heavyOpRDD.foreach(println) } } bit1...@163.com