Re: Iterating on RDDs

2015-02-27 Thread Vijayasarathy Kannan
As you suggested, I tried to save the grouped RDD and persisted it in memory before the iterations begin. The performance seems to be much better now. My previous comment that the run times doubled was from a wrong observation. Thanks. On Fri, Feb 27, 2015 at 10:27 AM, Vijayasarathy Kannan

Re: Iterating on RDDs

2015-02-27 Thread Vijayasarathy Kannan
Thanks. I tried persist() on the RDD. The runtimes appear to have doubled now (without persist() it was ~7s per iteration and now its ~15s). I am running standalone Spark on a 8-core machine. Any thoughts on why the increase in runtime? On Thu, Feb 26, 2015 at 4:27 PM, Imran Rashid

Iterating on RDDs

2015-02-26 Thread Vijayasarathy Kannan
Hi, I have the following use case. (1) I have an RDD of edges of a graph (say R). (2) do a groupBy on R (by say source vertex) and call a function F on each group. (3) collect the results from Fs and do some computation (4) repeat the above steps until some criteria is met In (2), the groups

Re: Iterating on RDDs

2015-02-26 Thread Imran Rashid
val grouped = R.groupBy[VertexId](G).persist(StorageLeve.MEMORY_ONLY_SER) // or whatever persistence makes more sense for you ... while(true) { val res = grouped.flatMap(F) res.collect.foreach(func) if(criteria) break } On Thu, Feb 26, 2015 at 10:56 AM, Vijayasarathy Kannan