I have been trying to fix this bug. The related PR: https://github.com/apache/spark/pull/2631
------------------ Original ------------------ From: "Xu Lijie";<lijie....@gmail.com>; Date: Tue, Nov 11, 2014 10:19 AM To: "user"<u...@spark.apache.org>; "dev"<dev@spark.apache.org>; Subject: Checkpoint bugs in GraphX Hi, all. I'm not sure whether someone has reported this bug: There should be a checkpoint() method in EdgeRDD and VertexRDD as follows: override def checkpoint(): Unit = { partitionsRDD.checkpoint() } Current EdgeRDD and VertexRDD use *RDD.checkpoint()*, which only checkpoint the edges/vertices but not the critical partitionsRDD. Also, the variables (partitionsRDD and targetStroageLevel) in EdgeRDD and VertexRDD should be transient. class EdgeRDD[@specialized ED: ClassTag, VD: ClassTag]( @transient val partitionsRDD: RDD[(PartitionID, EdgePartition[ED, VD])], @transient val targetStorageLevel: StorageLevel = StorageLevel.MEMORY_ONLY) extends RDD[Edge[ED]](partitionsRDD.context, List(new OneToOneDependency(partitionsRDD))) { class VertexRDD[@specialized VD: ClassTag]( @transient val partitionsRDD: RDD[ShippableVertexPartition[VD]], @transient val targetStorageLevel: StorageLevel = StorageLevel.MEMORY_ONLY) extends RDD[(VertexId, VD)](partitionsRDD.context, List(new OneToOneDependency(partitionsRDD))) { These two bugs usually lead to stackoverflow error in iterative application written by GraphX.