I'm curious to see that if you declare broadcasted wrapper as a var, and overwrite it in the driver program, the modification can have stable impact on all transformations/actions defined BEFORE the overwrite but was executed lazily AFTER the overwrite:
val a = sc.parallelize(1 to 10) var broadcasted = sc.broadcast("broad") val b = a.map(_ + broadcasted.value) // b.persist() for (line <- b.collect()) { print(line) } println("\n=======================================") broadcasted = sc.broadcast("cast") for (line <- b.collect()) { print(line) } the result is: 1broad2broad3broad4broad5broad6broad7broad8broad9broad10broad ======================================= 1cast2cast3cast4cast5cast6cast7cast8cast9cast10cast Of course, if you persist b before overwriting it will still get the non-surprising result (both are 10broad... because they are persisted). This can be useful sometimes but may cause confusion at other times (people can no longer add persist at will just for backup because it may change the result). So far I've found no documentation supporting this feature. So can some one confirm that its a feature craftly designed? Yours Peng -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Bug-or-feature-Overwrite-broadcasted-variables-tp12315.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org