Hi Kannan, I am not sure I have understood what your question is exactly, but maybe the reduceByKey or reduceByKeyLocally functionality is better to your need.
Best, Yifan LI > On 17 Feb 2015, at 17:37, Vijayasarathy Kannan <kvi...@vt.edu> wrote: > > Hi, > > I am working on a Spark application that processes graphs and I am trying to > do the following. > > - group the vertices (key - vertex, value - set of its outgoing edges) > - distribute each key to separate processes and process them (like mapper) > - reduce the results back at the main process > > Does the "groupBy" functionality do the distribution by default? > Do we have to explicitly use RDDs to enable automatic distribution? > > It'd be great if you could help me understand these and how to go about with > the problem. > > Thanks.