I have been trying to fix this bug.‍
The related PR: 
https://github.com/apache/spark/pull/2631‍


------------------ Original ------------------
From:  "Xu Lijie";<lijie....@gmail.com>;
Date:  Tue, Nov 11, 2014 10:19 AM
To:  "user"<u...@spark.apache.org>; "dev"<dev@spark.apache.org>; 

Subject:  Checkpoint bugs in GraphX



Hi, all. I'm not sure whether someone has reported this bug:


There should be a checkpoint() method in EdgeRDD and VertexRDD as follows:

override def checkpoint(): Unit = { partitionsRDD.checkpoint() }


Current EdgeRDD and VertexRDD use *RDD.checkpoint()*, which only checkpoint
the edges/vertices but not the critical partitionsRDD.


Also, the variables (partitionsRDD and targetStroageLevel) in EdgeRDD and
VertexRDD should be transient.

class EdgeRDD[@specialized ED: ClassTag, VD: ClassTag]( @transient val
partitionsRDD: RDD[(PartitionID, EdgePartition[ED, VD])], @transient val
targetStorageLevel: StorageLevel = StorageLevel.MEMORY_ONLY) extends
RDD[Edge[ED]](partitionsRDD.context, List(new
OneToOneDependency(partitionsRDD))) {


class VertexRDD[@specialized VD: ClassTag]( @transient val partitionsRDD:
RDD[ShippableVertexPartition[VD]], @transient val targetStorageLevel:
StorageLevel = StorageLevel.MEMORY_ONLY) extends RDD[(VertexId,
VD)](partitionsRDD.context, List(new OneToOneDependency(partitionsRDD))) {


These two bugs usually lead to stackoverflow error in iterative application
written by GraphX.

Reply via email to