Hi all, I'm filtering a DStream using a function. I need to be able to change this function while the application is running (I'm polling a service to see if a user has changed their filtering). The filter function is a transformation and runs on the workers, so that's where the updates need to go. I'm not sure of the best way to do this.
Initially broadcasting seemed like the way to go: the filter is actually quite large. But I don't think I can update something I've broadcasted. I've tried unpersisting and re-creating the broadcast variable but it became obvious this wasn't updating the reference on the worker. So am I correct in thinking I can't use broadcasted variables for this purpose? The next option seems to be: stopping the JavaStreamingContext, creating a new one from the SparkContext, updating the filter function, and re-creating the DStreams (I'm using direct streams from Kafka). If I re-created the JavaStreamingContext would the accumulators (which are created from the SparkContext) keep working? (Obviously I'm going to try this soon) In summary: 1) Can broadcasted variables be updated? 2) Is there a better way than re-creating the JavaStreamingContext and DStreams? Thanks, James