This sounds more like a use case for reduce? or fold? it sounds like you're kind of cobbling together the same function on accumulators, when reduce/fold are simpler and have the behavior you suggest.
On Fri, Nov 21, 2014 at 5:46 AM, Nathan Kronenfeld <nkronenf...@oculusinfo.com> wrote: > I think I understand what is going on here, but I was hoping someone could > confirm (or explain reality if I don't) what I'm seeing. > > We are collecting data using a rather sizable accumulator - essentially, an > array of tens of thousands of entries. All told, about 1.3m of data. > > If I understand things correctly, it looks to me like, when our job is done, > a copy of this array is retrieved from each individual task, all at once, > for combination on the client - which means, with 400 tasks to the job, each > collection is using up half a gig of memory on the client. > > Is this true? If so, does anyone know a way to get accumulators to > accumulate as results collect, rather than all at once at the end, so we > only have to hold a few in memory at a time, rather than all 400? > > Thanks, > -Nathan > > > -- > Nathan Kronenfeld > Senior Visualization Developer > Oculus Info Inc > 2 Berkeley Street, Suite 600, > Toronto, Ontario M5A 4J5 > Phone: +1-416-203-3003 x 238 > Email: nkronenf...@oculusinfo.com --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org