RE: function state lost when next RDD is processed

Adrian Mocanu Fri, 28 Mar 2014 12:07:33 -0700

Thanks!

Ya that's what I'm doing so far, but I wanted to see if it's possible to keep 
the tuples inside Spark for fault tolerance purposes.

-A
From: Mark Hamstra [mailto:m...@clearstorydata.com]
Sent: March-28-14 10:45 AM
To: user@spark.apache.org
Subject: Re: function state lost when next RDD is processed

As long as the amount of state being passed is relatively small, it's probably 
easiest to send it back to the driver and to introduce it into RDD 
transformations as the zero value of a fold.

On Fri, Mar 28, 2014 at 7:12 AM, Adrian Mocanu 
<amoc...@verticalscope.com<mailto:amoc...@verticalscope.com>> wrote:
I'd like to resurrect this thread since I don't have an answer yet.

From: Adrian Mocanu 
[mailto:amoc...@verticalscope.com<mailto:amoc...@verticalscope.com>]
Sent: March-27-14 10:04 AM
To: u...@spark.incubator.apache.org<mailto:u...@spark.incubator.apache.org>
Subject: function state lost when next RDD is processed

Is there a way to pass a custom function to spark to run it on the entire 
stream? For example, say I have a function which sums up values in each RDD and 
then across RDDs.

I've tried with map, transform, reduce. They all apply my sum function on 1 
RDD. When the next RDD comes the function starts from 0 so the sum of the 
previous RDD is lost.

Does Spark support a way of passing a custom function so that its state is 
preserved across RDDs and not only within RDD?

Thanks
-Adrian

RE: function state lost when next RDD is processed

Reply via email to