Re: [Spark Streaming] Dynamic Broadcast Variable Update

Pierce Lamb Fri, 05 May 2017 10:04:21 -0700

Hi Nipun,

To expand a bit, you might find this stackoverflow answer useful:


http://stackoverflow.com/a/39753976/3723346

Most spark + database combinations can handle a use case like this.

Hope this helps,

Pierce

On Thu, May 4, 2017 at 9:18 AM, Gene Pang <gene.p...@gmail.com> wrote:

> As Tim pointed out, Alluxio (renamed from Tachyon) may be able to help
> you. Here is some documentation on how to run Alluxio and Spark together
> <http://www.alluxio.org/docs/1.4/en/Running-Spark-on-Alluxio.html>, and
> here is a blog post on a Spark streaming + Alluxio use case
> <https://www.alluxio.com/blog/qunar-performs-real-time-data-analytics-up-to-300x-faster-with-alluxio>
> .
>
> Hope that helps,
> Gene
>
> On Tue, May 2, 2017 at 11:56 AM, Nipun Arora <nipunarora2...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> To support our Spark Streaming based anomaly detection tool, we have made
>> a patch in Spark 1.6.2 to dynamically update broadcast variables.
>>
>> I'll first explain our use-case, which I believe should be common to
>> several people using Spark Streaming applications. Broadcast variables are
>> often used to store values "machine learning models", which can then be
>> used on streaming data to "test" and get the desired results (for our case
>> anomalies). Unfortunately, in the current spark, broadcast variables are
>> final and can only be initialized once before the initialization of the
>> streaming context. Hence, if a new model is learned the streaming system
>> cannot be updated without shutting down the application, broadcasting
>> again, and restarting the application. Our goal was to re-broadcast
>> variables without requiring a downtime of the streaming service.
>>
>> The key to this implementation is a live re-broadcastVariable()
>> interface, which can be triggered in between micro-batch executions,
>> without any re-boot required for the streaming application. At a high level
>> the task is done by re-fetching broadcast variable information from the
>> spark driver, and then re-distribute it to the workers. The micro-batch
>> execution is blocked while the update is made, by taking a lock on the
>> execution. We have already tested this in our prototype deployment of our
>> anomaly detection service and can successfully re-broadcast the broadcast
>> variables with no downtime.
>>
>> We would like to integrate these changes in spark, can anyone please let
>> me know the process of submitting patches/ new features to spark. Also. I
>> understand that the current version of Spark is 2.1. However, our changes
>> have been done and tested on Spark 1.6.2, will this be a problem?
>>
>> Thanks
>> Nipun
>>
>
>

Re: [Spark Streaming] Dynamic Broadcast Variable Update

Reply via email to