Swetha

Look at
http://spark.apache.org/docs/latest/programming-guide.html#shared-variables

<http://spark.apache.org/docs/latest/programming-guide.html#shared-variables>

Normally, when a function passed to a Spark operation (such as map or reduce)
is executed on a remote cluster node, it works on separate copies of all
the variables used in the function. These variables are copied to each
machine, and no updates to the variables on the remote machine are
propagated back to the driver program. Supporting general, read-write
shared variables across tasks would be inefficient. However, Spark does
provide two limited types of *shared variables* for two common usage
patterns: broadcast variables and accumulators.


Deenar

On 17 October 2015 at 02:05, swetha <swethakasire...@gmail.com> wrote:

> Hi,
>
> How to have a single reference of a class across all the executors in Spark
> Streaming? The contents of the class will be updated at all the executors.
> Would using it as a variable inside updateStateByKey guarantee that
> reference is updated across all the  executors and no
> concurrentModificationException? Following is how I am trying to use a
> Tracker Class across all the JVMs.
>
> val trackerClass = new TrackerClass();
>
>
> val runningCounts = pairs.updateStateByKey[Int](updateFunction _)
>
> def updateFunction(newValues: Seq[Int], runningCount: Option[Int]):
> Option[Int] = {
>     getMergedSession(this.trackerClass)
>     Some(newCount)
> }
>
>
>
> Thanks,
> Swetha
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-have-Single-refernce-of-a-class-in-Spark-Streaming-tp25103.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to