Re: Accumulator question

2016-05-10 Thread Abi
On May 9, 2016 8:24:06 PM EDT, Abi wrote: >I am splitting an integer array in 2 partitions and using an >accumulator to sum the array. problem is > >1. I am not seeing execution time becoming half of a linear summing. > >2. The second node (from looking at

Re: Accumulator question

2016-05-10 Thread Rishi Mishra
Your mail does not describe much , but wont a simple reduce function help you ? Something like as below val data = Seq(1,2,3,4,5,6,7) val rdd = sc.parallelize(data, 2) val sum = rdd.reduce((a,b) => a+b) Regards, Rishitesh Mishra, SnappyData . (http://www.snappydata.io/)

Re: Accumulator question

2016-05-09 Thread Abi
I am splitting an integer array in 2 partitions and using an accumulator to sum the array. problem is 1. I am not seeing execution time becoming half of a linear summing. 2. The second node (from looking at timestamps) takes 3 times as long as the first node. This gives the impression it is

Accumulator question

2016-05-09 Thread Abi
I am splitting an integer array in 2 partitions and using an accumulator to sum the array. problem is 1. I am not seeing execution time becoming half of a linear summing. 2. The second node (from looking at timestamps) takes 3 times as long as the first node. This gives the impression it is

Re: Another accumulator question

2014-11-21 Thread Sean Owen
This sounds more like a use case for reduce? or fold? it sounds like you're kind of cobbling together the same function on accumulators, when reduce/fold are simpler and have the behavior you suggest. On Fri, Nov 21, 2014 at 5:46 AM, Nathan Kronenfeld nkronenf...@oculusinfo.com wrote: I think I

Re: Another accumulator question

2014-11-21 Thread Nathan Kronenfeld
We've done this with reduce - that definitely works. I've reworked the logic to use accumulators because, when it works, it's 5-10x faster On Fri, Nov 21, 2014 at 4:44 AM, Sean Owen so...@cloudera.com wrote: This sounds more like a use case for reduce? or fold? it sounds like you're kind of

Re: Another accumulator question

2014-11-21 Thread Andrew Ash
Hi Nathan, It sounds like what you're asking for has already been filed as https://issues.apache.org/jira/browse/SPARK-664 Does that ticket match what you're proposing? Andrew On Fri, Nov 21, 2014 at 12:29 PM, Nathan Kronenfeld nkronenf...@oculusinfo.com wrote: We've done this with reduce -

Re: Another accumulator question

2014-11-21 Thread Nathan Kronenfeld
Im not sure if it's an exact match, or just very close :-) I don't think our problem is the workload on the driver, I think it's just memory - so while the solution proposed there would work, it would also be sufficient for our purposes, I believe, simply to clear each block as soon as it's added

Another accumulator question

2014-11-20 Thread Nathan Kronenfeld
I think I understand what is going on here, but I was hoping someone could confirm (or explain reality if I don't) what I'm seeing. We are collecting data using a rather sizable accumulator - essentially, an array of tens of thousands of entries. All told, about 1.3m of data. If I understand

Accumulator question

2014-10-03 Thread Nathan Kronenfeld
I notice that accumulators register themselves with a private Accumulators object. I don't notice any way to unregister them when one is done. Am I missing something? If not, is there any plan for how to free up that memory? I've a case where we're gathering data from repeated queries using