Re: Brodcast Variable updated from one transformation and used from another

2015-02-25 Thread Yiannis Gkoufas
What I think is happening that the map operations are executed concurrently and the map operation in rdd2 has the initial copy of myObjectBroadcated. Is there a way to apply the transformations sequentially? First materialize rdd1 and then rdd2. Thanks a lot! On 24 February 2015 at 18:49,

Re: Brodcast Variable updated from one transformation and used from another

2015-02-25 Thread Imran Rashid
Hi Yiannis, Broadcast variables are meant for *immutable* data. They are not meant for data structures that you intend to update. (It might *happen* to work when running local mode, though I doubt it, and it would probably be a bug if it did. It will certainly not work when running on a

Re: Brodcast Variable updated from one transformation and used from another

2015-02-24 Thread Yiannis Gkoufas
Sorry for the mistake, I actually have it this way: val myObject = new MyObject(); val myObjectBroadcasted = sc.broadcast(myObject); val rdd1 = sc.textFile(/file1).map(e = { myObjectBroadcasted.value.insert(e._1); (e._1,1) }); rdd.cache.count(); //to make sure it is transformed. val rdd2 =

RE: Brodcast Variable updated from one transformation and used from another

2015-02-24 Thread Ganelin, Ilya
You're not using the broadcasted variable within your map operations. You're attempting to modify myObjrct directly which won't work because you are modifying the serialized copy on the executor. You want to do myObjectBroadcasted.value.insert and myObjectBroadcasted.value.lookup. Sent with