Read from file and broadcast before every Spark Streaming bucket?

2015-01-29 Thread YaoPau
I'm creating a real-time visualization of counts of ads shown on my website,
using that data pushed through by Spark Streaming.

To avoid clutter, it only looks good to show 4 or 5 lines on my
visualization at once (corresponding to 4 or 5 different ads), but there are
50+ different ads that show on my site.

What I'd like to do is quickly change which ads to pump through Spark
Streaming, without having to rebuild the .jar and push it to my edge node. 
Ideally I'd have a .csv file on my edge node with a list of 4 ad names, and
every time a StreamRDD is created it reads from that tiny file, creates a
broadcast variable, and uses that variable as a filter.  That way I could
just open up the .csv file, save it, and the stream filters correctly
automatically.

I keep getting errors when I try this.  Has anyone had success with a
broadcast variable that updates with each new streamRDD?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Read-from-file-and-broadcast-before-every-Spark-Streaming-bucket-tp21433.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Read from file and broadcast before every Spark Streaming bucket?

2015-01-30 Thread Sean Owen
You should say what errors you see. But I assume it is because you try to
create broadcast variables on the executors. Why? Sounds like you already
have the data you want everywhere to read locally.
On Jan 30, 2015 4:06 AM, "YaoPau"  wrote:

> I'm creating a real-time visualization of counts of ads shown on my
> website,
> using that data pushed through by Spark Streaming.
>
> To avoid clutter, it only looks good to show 4 or 5 lines on my
> visualization at once (corresponding to 4 or 5 different ads), but there
> are
> 50+ different ads that show on my site.
>
> What I'd like to do is quickly change which ads to pump through Spark
> Streaming, without having to rebuild the .jar and push it to my edge node.
> Ideally I'd have a .csv file on my edge node with a list of 4 ad names, and
> every time a StreamRDD is created it reads from that tiny file, creates a
> broadcast variable, and uses that variable as a filter.  That way I could
> just open up the .csv file, save it, and the stream filters correctly
> automatically.
>
> I keep getting errors when I try this.  Has anyone had success with a
> broadcast variable that updates with each new streamRDD?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Read-from-file-and-broadcast-before-every-Spark-Streaming-bucket-tp21433.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>