Hi, I am working on a Spark application that processes graphs and I am trying to do the following.
- group the vertices (key - vertex, value - set of its outgoing edges) - distribute each key to separate processes and process them (like mapper) - reduce the results back at the main process Does the "groupBy" functionality do the distribution by default? Do we have to explicitly use RDDs to enable automatic distribution? It'd be great if you could help me understand these and how to go about with the problem. Thanks.