My use case has one large data stream (DS1) that obviously maps to a DStream. The processing of DS1 involves filtering it for any of a set of known values, which will change over time, though slowly by streaming standards. If the filter data were static, it seems to obviously map to a broadcast variable, but it's dynamic. (And I don't think it works to implement it as a DStream, because the new values need to be copied redundantly to all executors, not partitioned among the executors.)
Looking at the Spark and Spark Streaming documentation, I have two questions: 1) There's no mention in the Spark Streaming Programming Guide of broadcast variables. Do they coexist properly? 2) Once I have a broadcast variable in place in the "periodic function" that Spark Streaming executes, how can I update its value? Obviously I can't literally update the value of that broadcast variable, which is immutable, but how can I get a new version of the variable established in all the executors? (And the other ever-present implicit question...) 3) Is there a better way to implement this? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-blend-a-DStream-and-a-broadcast-variable-tp18227.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org