Hi,

I am working on a Spark application that processes graphs and I am trying
to do the following.

- group the vertices (key - vertex, value - set of its outgoing edges)
- distribute each key to separate processes and process them (like mapper)
- reduce the results back at the main process

Does the "groupBy" functionality do the distribution by default?
Do we have to explicitly use RDDs to enable automatic distribution?

It'd be great if you could help me understand these and how to go about
with the problem.

Thanks.

Reply via email to