2 is added every time the final partition aggregator is called. The result
of summing the elements across partitions is 9 of course. If you force a
single partition (using spark-shell in local mode):
scala val data = sc.parallelize(List(2,3,4),1)
scala data.aggregate(0)((x,y) = x+y,(x,y) = 2+x+y)
res0: Int = 11
The 2nd function is still called, even though there is only one partition
(presumably either x or y is set to 0).
For every additional partition you specify as the 2nd arg. to parallelize,
the 2nd function will be called again:
scala val data = sc.parallelize(List(2,3,4),1)
scala data.aggregate(0)((x,y) = x+y,(x,y) = 2+x+y)
res0: Int = 11
scala val data = sc.parallelize(List(2,3,4),2)
scala data.aggregate(0)((x,y) = x+y,(x,y) = 2+x+y)
res0: Int = 13
scala val data = sc.parallelize(List(2,3,4),3)
scala data.aggregate(0)((x,y) = x+y,(x,y) = 2+x+y)
res0: Int = 15
scala val data = sc.parallelize(List(2,3,4),4)
scala data.aggregate(0)((x,y) = x+y,(x,y) = 2+x+y)
res0: Int = 17
Hence, it appears that not specifying the 2nd argument resulted in 4
partitions, even though you only had three elements in the list.
If p_i is the ith partition, the final sum appears to be:
(2 + ... (2 + (2 + (2 + 0 + p_1) + p_2) + p_3) ...)
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
http://shop.oreilly.com/product/0636920033073.do (O'Reilly)
Typesafe http://typesafe.com
@deanwampler http://twitter.com/deanwampler
http://polyglotprogramming.com
On Sun, Mar 22, 2015 at 8:05 AM, ashish.usoni ashish.us...@gmail.com
wrote:
Hi ,
I am not able to understand how aggregate function works, Can some one
please explain how below result came
I am running spark using cloudera VM
The result in below is 17 but i am not able to find out how it is
calculating 17
val data = sc.parallelize(List(2,3,4))
data.aggregate(0)((x,y) = x+y,(x,y) = 2+x+y)
*res21: Int = 17*
Also when i try to change the 2nd parameter in sc.parallelize i get
different result
val data = sc.parallelize(List(2,3,4),2)
data.aggregate(0)((x,y) = x+y,(x,y) = 2+x+y)
*res21: Int = 13*
Thanks for the help.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-Does-aggregate-work-tp22179.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org