2 is added every time the final partition aggregator is called. The result
of summing the elements across partitions is 9 of course. If you force a
single partition (using spark-shell in local mode):
scala> val data = sc.parallelize(List(2,3,4),1)
scala> data.aggregate(0)((x,y) => x+y,(x,y) => 2+x+y)
res0: Int = 11
The 2nd function is still called, even though there is only one partition
(presumably either x or y is set to 0).
For every additional partition you specify as the 2nd arg. to parallelize,
the 2nd function will be called again:
scala> val data = sc.parallelize(List(2,3,4),1)
scala> data.aggregate(0)((x,y) => x+y,(x,y) => 2+x+y)
res0: Int = 11
scala> val data = sc.parallelize(List(2,3,4),2)
scala> data.aggregate(0)((x,y) => x+y,(x,y) => 2+x+y)
res0: Int = 13
scala> val data = sc.parallelize(List(2,3,4),3)
scala> data.aggregate(0)((x,y) => x+y,(x,y) => 2+x+y)
res0: Int = 15
scala> val data = sc.parallelize(List(2,3,4),4)
scala> data.aggregate(0)((x,y) => x+y,(x,y) => 2+x+y)
res0: Int = 17
Hence, it appears that not specifying the 2nd argument resulted in 4
partitions, even though you only had three elements in the list.
If p_i is the ith partition, the final sum appears to be:
(2 + ... (2 + (2 + (2 + 0 + p_1) + p_2) + p_3) ...)
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typesafe.com>
@deanwampler <http://twitter.com/deanwampler>
http://polyglotprogramming.com
On Sun, Mar 22, 2015 at 8:05 AM, ashish.usoni
wrote:
> Hi ,
> I am not able to understand how aggregate function works, Can some one
> please explain how below result came
> I am running spark using cloudera VM
>
> The result in below is 17 but i am not able to find out how it is
> calculating 17
> val data = sc.parallelize(List(2,3,4))
> data.aggregate(0)((x,y) => x+y,(x,y) => 2+x+y)
> *res21: Int = 17*
>
> Also when i try to change the 2nd parameter in sc.parallelize i get
> different result
>
> val data = sc.parallelize(List(2,3,4),2)
> data.aggregate(0)((x,y) => x+y,(x,y) => 2+x+y)
> *res21: Int = 13*
>
> Thanks for the help.
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-Does-aggregate-work-tp22179.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>