Hi Nipun, To expand a bit, you might find this stackoverflow answer useful:
http://stackoverflow.com/a/39753976/3723346 Most spark + database combinations can handle a use case like this. Hope this helps, Pierce On Thu, May 4, 2017 at 9:18 AM, Gene Pang <gene.p...@gmail.com> wrote: > As Tim pointed out, Alluxio (renamed from Tachyon) may be able to help > you. Here is some documentation on how to run Alluxio and Spark together > <http://www.alluxio.org/docs/1.4/en/Running-Spark-on-Alluxio.html>, and > here is a blog post on a Spark streaming + Alluxio use case > <https://www.alluxio.com/blog/qunar-performs-real-time-data-analytics-up-to-300x-faster-with-alluxio> > . > > Hope that helps, > Gene > > On Tue, May 2, 2017 at 11:56 AM, Nipun Arora <nipunarora2...@gmail.com> > wrote: > >> Hi All, >> >> To support our Spark Streaming based anomaly detection tool, we have made >> a patch in Spark 1.6.2 to dynamically update broadcast variables. >> >> I'll first explain our use-case, which I believe should be common to >> several people using Spark Streaming applications. Broadcast variables are >> often used to store values "machine learning models", which can then be >> used on streaming data to "test" and get the desired results (for our case >> anomalies). Unfortunately, in the current spark, broadcast variables are >> final and can only be initialized once before the initialization of the >> streaming context. Hence, if a new model is learned the streaming system >> cannot be updated without shutting down the application, broadcasting >> again, and restarting the application. Our goal was to re-broadcast >> variables without requiring a downtime of the streaming service. >> >> The key to this implementation is a live re-broadcastVariable() >> interface, which can be triggered in between micro-batch executions, >> without any re-boot required for the streaming application. At a high level >> the task is done by re-fetching broadcast variable information from the >> spark driver, and then re-distribute it to the workers. The micro-batch >> execution is blocked while the update is made, by taking a lock on the >> execution. We have already tested this in our prototype deployment of our >> anomaly detection service and can successfully re-broadcast the broadcast >> variables with no downtime. >> >> We would like to integrate these changes in spark, can anyone please let >> me know the process of submitting patches/ new features to spark. Also. I >> understand that the current version of Spark is 2.1. However, our changes >> have been done and tested on Spark 1.6.2, will this be a problem? >> >> Thanks >> Nipun >> > >