Hi All, I have spent last two years on hadoop but new to spark. I am planning to move one of my existing system to spark to get some enhanced features.
My question is: If I try to do a map side join (something similar to "Replicated" key word in Pig), how can I do it? Is it anyway to declare a RDD as "replicated" (means distribute it to all nodes and each node will have a full copy)? I know I can use accumulator to get this feature, but I am not sure what is the best practice. And if I accumulator to broadcast the data set, can then (after broadcast) convert it into a RDD and do the join? Regards, Shuai