Iterating on RDDs

Vijayasarathy Kannan Thu, 26 Feb 2015 09:30:11 -0800

Hi,

I have the following use case.


(1) I have an RDD of edges of a graph (say R).
(2) do a groupBy on R (by say source vertex) and call a function F on each
group.
(3) collect the results from Fs and do some computation
(4) repeat the above steps until some criteria is met

In (2), the groups are always going to be the same (since R is grouped by
source vertex).

Question:
Is R distributed every iteration (when in (2)) or is it distributed only
once when it is created?

A sample code snippet is below.

while(true) {
  val res = R.groupBy[VertexId](G).flatMap(F)
  res.collect.foreach(func)
  if(criteria)
     break
}

Since the groups remain the same, what is the best way to go about
implementing the above logic?

Iterating on RDDs

Reply via email to