You can use broadcast variable. See also this thread: http://search-hadoop.com/m/JW1q5GX7U22/Spark+broadcast+variable&subj=How+Broadcast+variable+scale+
> On Mar 31, 2015, at 4:43 AM, Peng Xia <sparkpeng...@gmail.com> wrote: > > Hi, > > I have a RDD (rdd1)where each line is split into an array ["a", "b", "c], etc. > And I also have a local dictionary p (dict1) stores key value pair {"a":1, > "b": 2, c:3} > I want to replace the keys in the rdd with the its corresponding value in the > dict: > rdd1.map(lambda line: [dict1[item] for item in line]) > > But this task is not distributed, I believe the reason is the dict1 is a > local instance. > Can any one provide suggestions on this to parallelize this? > > > Thanks, > Best, > Peng > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org