Re: Accumulator question
On May 9, 2016 8:24:06 PM EDT, Abiwrote: >I am splitting an integer array in 2 partitions and using an >accumulator to sum the array. problem is > >1. I am not seeing execution time becoming half of a linear summing. > >2. The second node (from looking at timestamps) takes 3 times as long >as the first node. This gives the impression it is "waiting" for the >first node to finish. > >Hence, I am given the impression using accumulator.sum () in the >kernel and rdd.foreach (kernel) is making things sequential. > >Any api/setting suggestions where I could make things parallel ? > > >
Re: Accumulator question
Your mail does not describe much , but wont a simple reduce function help you ? Something like as below val data = Seq(1,2,3,4,5,6,7) val rdd = sc.parallelize(data, 2) val sum = rdd.reduce((a,b) => a+b) Regards, Rishitesh Mishra, SnappyData . (http://www.snappydata.io/) https://in.linkedin.com/in/rishiteshmishra On Tue, May 10, 2016 at 10:44 AM, Abiwrote: > I am splitting an integer array in 2 partitions and using an accumulator > to sum the array. problem is > > 1. I am not seeing execution time becoming half of a linear summing. > > 2. The second node (from looking at timestamps) takes 3 times as long as > the first node. This gives the impression it is "waiting" for the first > node to finish. > > Hence, I am given the impression using accumulator.sum () in the kernel > and rdd.foreach (kernel) is making things sequential. > > Any api/setting suggestions where I could make things parallel ? > > On Mon, May 9, 2016 at 8:24 PM, Abi wrote: > >> I am splitting an integer array in 2 partitions and using an accumulator >> to sum the array. problem is >> >> 1. I am not seeing execution time becoming half of a linear summing. >> >> 2. The second node (from looking at timestamps) takes 3 times as long as >> the first node. This gives the impression it is "waiting" for the first >> node to finish. >> >> Hence, I am given the impression using accumulator.sum () in the kernel >> and rdd.foreach (kernel) is making things sequential. >> >> Any api/setting suggestions where I could make things parallel ? >> >> >> >
Re: Accumulator question
I am splitting an integer array in 2 partitions and using an accumulator to sum the array. problem is 1. I am not seeing execution time becoming half of a linear summing. 2. The second node (from looking at timestamps) takes 3 times as long as the first node. This gives the impression it is "waiting" for the first node to finish. Hence, I am given the impression using accumulator.sum () in the kernel and rdd.foreach (kernel) is making things sequential. Any api/setting suggestions where I could make things parallel ? On Mon, May 9, 2016 at 8:24 PM, Abiwrote: > I am splitting an integer array in 2 partitions and using an accumulator > to sum the array. problem is > > 1. I am not seeing execution time becoming half of a linear summing. > > 2. The second node (from looking at timestamps) takes 3 times as long as > the first node. This gives the impression it is "waiting" for the first > node to finish. > > Hence, I am given the impression using accumulator.sum () in the kernel > and rdd.foreach (kernel) is making things sequential. > > Any api/setting suggestions where I could make things parallel ? > > >