One approach would be to store the batch data in an intermediate storage
(like HBase/MySQL or even in zookeeper), and inside your filter function
you just go and read the previous value from this storage and do whatever
operation that you are supposed to do.

Thanks
Best Regards

On Sun, Jul 26, 2015 at 3:37 AM, foobar <heath...@fb.com> wrote:

> Hi I'm working with Spark Streaming using scala, and trying to figure out
> the
> following problem. In my DStream[(int, int)], each record is an int pair
> tuple. For each batch, I would like to filter out all records with first
> integer below average of first integer in this batch, and for all records
> with first integer above average of first integer in the batch, compute the
> average of second integers in such records. What's the best practice to
> implement this? I tried this but kept getting the object not serializable
> exception because it's hard to share variables (such as average of first
> int
> in the batch) between workers and driver. Any suggestions? Thanks!
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Multiple-operations-on-same-DStream-in-Spark-Streaming-tp23995.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to