Repository: spark
Updated Branches:
  refs/heads/branch-1.0 4ac8135dc -> 715fbfab9


[SPARK-2025] Unpersist edges of previous graph in Pregel

Due to a bug introduced by apache/spark#497, Pregel does not unpersist 
replicated vertices from previous iterations. As a result, they stay cached 
until memory is full, wasting GC time.

This PR corrects the problem by unpersisting both the edges and the replicated 
vertices of previous iterations. This is safe because the edges and replicated 
vertices of the current iteration are cached by the call to `g.cache()` and 
then materialized by the call to `messages.count()`. Therefore no 
unmaterialized RDDs depend on `prevG.edges`. I verified that no recomputation 
occurs by running PageRank with a custom patch to Spark that warns when a 
partition is recomputed.

Thanks to Tim Weninger for reporting this bug.

Author: Ankur Dave <[email protected]>

Closes #972 from ankurdave/SPARK-2025 and squashes the following commits:

13d5b07 [Ankur Dave] Unpersist edges of previous graph in Pregel

(cherry picked from commit 9bad0b73722fb359f14db864e69aa7efde3588c5)
Signed-off-by: Reynold Xin <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/715fbfab
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/715fbfab
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/715fbfab

Branch: refs/heads/branch-1.0
Commit: 715fbfab9b94223ee6cb167cb69e1895ac7101e3
Parents: 4ac8135
Author: Ankur Dave <[email protected]>
Authored: Thu Jun 5 17:45:38 2014 -0700
Committer: Reynold Xin <[email protected]>
Committed: Thu Jun 5 17:45:54 2014 -0700

----------------------------------------------------------------------
 graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala | 1 +
 1 file changed, 1 insertion(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/715fbfab/graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala
----------------------------------------------------------------------
diff --git a/graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala 
b/graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala
index 4572eab..5e55620 100644
--- a/graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala
+++ b/graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala
@@ -150,6 +150,7 @@ object Pregel extends Logging {
       oldMessages.unpersist(blocking=false)
       newVerts.unpersist(blocking=false)
       prevG.unpersistVertices(blocking=false)
+      prevG.edges.unpersist(blocking=false)
       // count the iteration
       i += 1
     }

Reply via email to