What I think is happening that the map operations are executed concurrently
and the map operation in rdd2 has the initial copy of myObjectBroadcated.
Is there a way to apply the transformations sequentially? First materialize
rdd1 and then rdd2.
Thanks a lot!
On 24 February 2015 at 18:49,
Hi Yiannis,
Broadcast variables are meant for *immutable* data. They are not meant for
data structures that you intend to update. (It might *happen* to work when
running local mode, though I doubt it, and it would probably be a bug if it
did. It will certainly not work when running on a
Sorry for the mistake, I actually have it this way:
val myObject = new MyObject();
val myObjectBroadcasted = sc.broadcast(myObject);
val rdd1 = sc.textFile(/file1).map(e =
{
myObjectBroadcasted.value.insert(e._1);
(e._1,1)
});
rdd.cache.count(); //to make sure it is transformed.
val rdd2 =
You're not using the broadcasted variable within your map operations. You're
attempting to modify myObjrct directly which won't work because you are
modifying the serialized copy on the executor. You want to do
myObjectBroadcasted.value.insert and myObjectBroadcasted.value.lookup.
Sent with