Re: Fatal error when using broadcast variables and checkpointing in Spark Streaming

2016-07-22 Thread Joe Panciera
l events num = events.value print num events.unpersist() events = sc.broadcast(num + 1) alert_stream.foreachRDD(test) # Comment this line and no error occurs ssc.checkpoint('dir') ssc.start() ssc.awaitTermination() On Fri, Jul 22, 2016 at 1:50 PM, Joe Panciera <joe.panci...@gm

Fatal error when using broadcast variables and checkpointing in Spark Streaming

2016-07-22 Thread Joe Panciera
Hi, I'm attempting to use broadcast variables to update stateful values used across the cluster for processing. Essentially, I have a function that is executed in .foreachRDD that updates the broadcast variable by calling unpersist() and then rebroadcasting. This works without issues when I

Using multiple data sources in one stream

2016-07-20 Thread Joe Panciera
Hi, I have a rather complicated situation thats raised an issue regarding consuming multiple data sources for processing. Unlike the use cases I've found, I have 3 sources of different formats. There's one 'main' stream A that does the processing, and 2 sources B and C that provide elements

Re: Variable in UpdateStateByKey Not Updating After Restarting Application from Checkpoint

2016-06-09 Thread Joe Panciera
On Wed, Jun 8, 2016 at 1:27 PM, Joe Panciera <joe.panci...@gmail.com> wrote: > I've run into an issue where a global variable used within an > UpdateStateByKey function isn't being assigned after the application > restarts from a checkpoint. Using ForEachRDD I have a

Variable in UpdateStateByKey Not Updating After Restarting Application from Checkpoint

2016-06-08 Thread Joe Panciera
I've run into an issue where a global variable used within an UpdateStateByKey function isn't being assigned after the application restarts from a checkpoint. Using ForEachRDD I have a global variable 'A' that is propagated from a file every time a batch runs, and A is then used in an