Hi Daniel,

Given your example, “arr” is defined on the driver, but the “foreachRDD” 
function is run on the executors. If you want to collect the results of the 
RDD/DStream down to the driver you need to call RDD.collect. You have to be 
careful though that you have enough memory on the driver JVM to hold the 
results, otherwise you’ll have an OOM exception. Also, you can’t update the 
value of a broadcast variable, since it’s immutable.

Thanks,
Silvio

From: Daniel Haviv 
<daniel.ha...@veracity-group.com<mailto:daniel.ha...@veracity-group.com>>
Date: Sunday, May 15, 2016 at 6:23 AM
To: user <user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: "collecting" DStream data

Hi,
I have a DStream I'd like to collect and broadcast it's values.
To do so I've created a mutable HashMap which i'm filling with foreachRDD but 
when I'm checking it, it remains empty. If I use ArrayBuffer it works as 
expected.

This is my code:

val arr = scala.collection.mutable.HashMap.empty[String,String]
MappedVersionsToBrodcast.foreachRDD(r=> { r.foreach(r=> { arr+=r})   } )


What am I missing here?

Thank you,
Daniel

Reply via email to