My use case has one large data stream (DS1) that obviously maps to a DStream. 
The processing of DS1 involves filtering it for any of a set of known
values, which will change over time, though slowly by streaming standards. 
If the filter data were static, it seems to obviously map to a broadcast
variable, but it's dynamic.  (And I don't think it works to implement it as
a DStream, because the new values need to be copied redundantly to all
executors, not partitioned among the executors.)

Looking at the Spark and Spark Streaming documentation, I have two
questions:

1) There's no mention in the Spark Streaming Programming Guide of broadcast
variables.  Do they coexist properly?

2) Once I have a broadcast variable in place in the "periodic function" that
Spark Streaming executes, how can I update its value?  Obviously I can't
literally update the value of that broadcast variable, which is immutable,
but how can I get a new version of the variable established in all the
executors?

(And the other ever-present implicit question...)

3) Is there a better way to implement this?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-blend-a-DStream-and-a-broadcast-variable-tp18227.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to