I think you're exactly right.  I once had 100 iterations in a single Pregel
call, and got into the lineage problem right there.  I had to modify the
Pregel function and checkpoint both the graph and the newVerts RDD there to
cut off the lineage.  If you draw out the dependency graph among the g, the
newVerts RDD and the messages RDD inside the Pregel loop, then you will find
out we need to checkpoint two things to really cut off the lineage: the
graph itself and one of newVerts or messages.  This is how I did it inside
the Pregel loop:
        ...
        prevG = g
        g = g.outerJoinVertices(newVerts) { (vid, old, newOpt) =>
newOpt.getOrElse(old) }
        g.cache()
        if (i % 50 == 0) {
          g.checkpoint
          newVerts.checkpoint
        }
        ...

Also note: checkpointing is only effective before the RDD is materialized. 
If you checkpoint outside of Pregel, which means the graph is already
materialized (by the mapReduceTriplets call), then nothing will happen.  You
can examine that by looking at the RDD.toDebugString.  Therefore, I had to
apply the following workaround:
          val clonedGraph = graph.mapVertices((vid, vd) => vd).mapEdges{edge
=> edge.attr}
          clonedGraph.checkpoint
          graph = clonedGraph





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/in-GraphX-program-with-Pregel-runs-slower-and-slower-after-several-iterations-tp23121p23133.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to