This sounds more like a use case for reduce? or fold? it sounds like
you're kind of cobbling together the same function on accumulators,
when reduce/fold are simpler and have the behavior you suggest.
On Fri, Nov 21, 2014 at 5:46 AM, Nathan Kronenfeld
nkronenf...@oculusinfo.com wrote:
I think I
We've done this with reduce - that definitely works.
I've reworked the logic to use accumulators because, when it works, it's
5-10x faster
On Fri, Nov 21, 2014 at 4:44 AM, Sean Owen so...@cloudera.com wrote:
This sounds more like a use case for reduce? or fold? it sounds like
you're kind of
Hi Nathan,
It sounds like what you're asking for has already been filed as
https://issues.apache.org/jira/browse/SPARK-664 Does that ticket match
what you're proposing?
Andrew
On Fri, Nov 21, 2014 at 12:29 PM, Nathan Kronenfeld
nkronenf...@oculusinfo.com wrote:
We've done this with reduce -
Im not sure if it's an exact match, or just very close :-)
I don't think our problem is the workload on the driver, I think it's just
memory - so while the solution proposed there would work, it would also be
sufficient for our purposes, I believe, simply to clear each block as soon
as it's added
I think I understand what is going on here, but I was hoping someone could
confirm (or explain reality if I don't) what I'm seeing.
We are collecting data using a rather sizable accumulator - essentially, an
array of tens of thousands of entries. All told, about 1.3m of data.
If I understand