Thanks for the email. The process is to create a JIRA ticket and then post a design doc for discussion. You will of course need to update your code to work with the latest master branch, but you should wait oj that until the community has a chance to comment on the design.
Cheers. On Fri, May 5, 2017 at 8:01 AM Nipun Arora <nipunarora2...@gmail.com> wrote: > Hi All, > > To support our Spark Streaming based anomaly detection tool, we have made > a patch in Spark 1.6.2 to dynamically update broadcast variables. > > I'll first explain our use-case, which I believe should be common to > several people using Spark Streaming applications. Broadcast variables are > often used to store values "machine learning models", which can then be > used on streaming data to "test" and get the desired results (for our case > anomalies). Unfortunately, in the current spark, broadcast variables are > final and can only be initialized once before the initialization of the > streaming context. Hence, if a new model is learned the streaming system > cannot be updated without shutting down the application, broadcasting > again, and restarting the application. Our goal was to re-broadcast > variables without requiring a downtime of the streaming service. > > The key to this implementation is a live re-broadcastVariable() interface, > which can be triggered in between micro-batch executions, without any > re-boot required for the streaming application. At a high level the task is > done by re-fetching broadcast variable information from the spark driver, > and then re-distribute it to the workers. The micro-batch execution is > blocked while the update is made, by taking a lock on the execution. We > have already tested this in our prototype deployment of our anomaly > detection service and can successfully re-broadcast the broadcast variables > with no downtime. > > We would like to integrate these changes in spark, can anyone please let > me know the process of submitting patches/ new features to spark. Also. I > understand that the current version of Spark is 2.1. However, our changes > have been done and tested on Spark 1.6.2, will this be a problem? > > Thanks > Nipun >