Hi All,

I have spent last two years on hadoop but new to spark.
I am planning to move one of my existing system to spark to get some
enhanced features.

My question is:

If I try to do a map side join (something similar to "Replicated" key word
in Pig), how can I do it? Is it anyway to declare a RDD as "replicated"
(means distribute it to all nodes and each node will have a full copy)?

I know I can use accumulator to get this feature, but I am not sure what is
the best practice. And if I accumulator to broadcast the data set, can then
(after broadcast) convert it into a RDD and do the join?

Regards,

Shuai

Reply via email to