Re: Another accumulator question

2014-11-21 Thread Sean Owen
This sounds more like a use case for reduce? or fold? it sounds like you're kind of cobbling together the same function on accumulators, when reduce/fold are simpler and have the behavior you suggest. On Fri, Nov 21, 2014 at 5:46 AM, Nathan Kronenfeld nkronenf...@oculusinfo.com wrote: I think I

Re: Another accumulator question

2014-11-21 Thread Nathan Kronenfeld
We've done this with reduce - that definitely works. I've reworked the logic to use accumulators because, when it works, it's 5-10x faster On Fri, Nov 21, 2014 at 4:44 AM, Sean Owen so...@cloudera.com wrote: This sounds more like a use case for reduce? or fold? it sounds like you're kind of

Re: Another accumulator question

2014-11-21 Thread Andrew Ash
Hi Nathan, It sounds like what you're asking for has already been filed as https://issues.apache.org/jira/browse/SPARK-664 Does that ticket match what you're proposing? Andrew On Fri, Nov 21, 2014 at 12:29 PM, Nathan Kronenfeld nkronenf...@oculusinfo.com wrote: We've done this with reduce -

Re: Another accumulator question

2014-11-21 Thread Nathan Kronenfeld
Im not sure if it's an exact match, or just very close :-) I don't think our problem is the workload on the driver, I think it's just memory - so while the solution proposed there would work, it would also be sufficient for our purposes, I believe, simply to clear each block as soon as it's added

Another accumulator question

2014-11-20 Thread Nathan Kronenfeld
I think I understand what is going on here, but I was hoping someone could confirm (or explain reality if I don't) what I'm seeing. We are collecting data using a rather sizable accumulator - essentially, an array of tens of thousands of entries. All told, about 1.3m of data. If I understand