As you suggested, I tried to save the grouped RDD and persisted it in
memory before the iterations begin. The performance seems to be much better
now.
My previous comment that the run times doubled was from a wrong observation.
Thanks.
On Fri, Feb 27, 2015 at 10:27 AM, Vijayasarathy Kannan
Thanks.
I tried persist() on the RDD. The runtimes appear to have doubled now
(without persist() it was ~7s per iteration and now its ~15s). I am running
standalone Spark on a 8-core machine.
Any thoughts on why the increase in runtime?
On Thu, Feb 26, 2015 at 4:27 PM, Imran Rashid
Hi,
I have the following use case.
(1) I have an RDD of edges of a graph (say R).
(2) do a groupBy on R (by say source vertex) and call a function F on each
group.
(3) collect the results from Fs and do some computation
(4) repeat the above steps until some criteria is met
In (2), the groups
val grouped = R.groupBy[VertexId](G).persist(StorageLeve.MEMORY_ONLY_SER)
// or whatever persistence makes more sense for you ...
while(true) {
val res = grouped.flatMap(F)
res.collect.foreach(func)
if(criteria)
break
}
On Thu, Feb 26, 2015 at 10:56 AM, Vijayasarathy Kannan