Hi Daniel, Given your example, “arr” is defined on the driver, but the “foreachRDD” function is run on the executors. If you want to collect the results of the RDD/DStream down to the driver you need to call RDD.collect. You have to be careful though that you have enough memory on the driver JVM to hold the results, otherwise you’ll have an OOM exception. Also, you can’t update the value of a broadcast variable, since it’s immutable.
Thanks, Silvio From: Daniel Haviv <daniel.ha...@veracity-group.com<mailto:daniel.ha...@veracity-group.com>> Date: Sunday, May 15, 2016 at 6:23 AM To: user <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: "collecting" DStream data Hi, I have a DStream I'd like to collect and broadcast it's values. To do so I've created a mutable HashMap which i'm filling with foreachRDD but when I'm checking it, it remains empty. If I use ArrayBuffer it works as expected. This is my code: val arr = scala.collection.mutable.HashMap.empty[String,String] MappedVersionsToBrodcast.foreachRDD(r=> { r.foreach(r=> { arr+=r}) } ) What am I missing here? Thank you, Daniel