Thanks Andrew, exactly. 2015-07-22 14:26 GMT-05:00 Andrew Or <and...@databricks.com>:
> Hi Dan, > > `map2` is a broadcast variable, not your map. To access the map on the > executors you need to do `map2.value(a)`. > > -Andrew > > 2015-07-22 12:20 GMT-07:00 Dan Dong <dongda...@gmail.com>: > >> Hi, Andrew, >> If I broadcast the Map: >> val map2=sc.broadcast(map1) >> >> I will get compilation error: >> org.apache.spark.broadcast.Broadcast[scala.collection.immutable.Map[Int,String]] >> does not take parameters >> [error] val matchs= Vecs.map(term=>term.map{case (a,b)=>(map2(a),b)}) >> >> Seems it's still an RDD, so how to access it by value=map2(key) ? Thanks! >> >> Cheers, >> Dan >> >> >> >> 2015-07-22 2:20 GMT-05:00 Andrew Or <and...@databricks.com>: >> >>> Hi Dan, >>> >>> If the map is small enough, you can just broadcast it, can't you? It >>> doesn't have to be an RDD. Here's an example of broadcasting an array and >>> using it on the executors: >>> https://github.com/apache/spark/blob/c03299a18b4e076cabb4b7833a1e7632c5c0dabe/examples/src/main/scala/org/apache/spark/examples/BroadcastTest.scala >>> . >>> >>> -Andrew >>> >>> 2015-07-21 19:56 GMT-07:00 ayan guha <guha.a...@gmail.com>: >>> >>>> Either you have to do rdd.collect and then broadcast or you can do a >>>> join >>>> On 22 Jul 2015 07:54, "Dan Dong" <dongda...@gmail.com> wrote: >>>> >>>>> Hi, All, >>>>> >>>>> >>>>> I am trying to access a Map from RDDs that are on different compute >>>>> nodes, but without success. The Map is like: >>>>> >>>>> val map1 = Map("aa"->1,"bb"->2,"cc"->3,...) >>>>> >>>>> All RDDs will have to check against it to see if the key is in the Map >>>>> or not, so seems I have to make the Map itself global, the problem is that >>>>> if the Map is stored as RDDs and spread across the different nodes, each >>>>> node will only see a piece of the Map and the info will not be complete to >>>>> check against the Map( an then replace the key with the corresponding >>>>> value) E,g: >>>>> >>>>> val matchs= Vecs.map(term=>term.map{case (a,b)=>(map1(a),b)}) >>>>> >>>>> But if the Map is not an RDD, how to share it like sc.broadcast(map1) >>>>> >>>>> Any idea about this? Thanks! >>>>> >>>>> >>>>> Cheers, >>>>> Dan >>>>> >>>>> >>> >> >