I had explored these examples couple of months back. very good link for RDD operations. see if below explanation helps, try to understand the difference between below 2 examples.. initial value in both is """ Example 1; val z = sc.parallelize(List("12","23","","345"),2) z.aggregate("")((x,y) => math.min(x.length, y.length).toString, (x,y) => x + y) res144: String = 11
Partition 1 ("12", "23") ("","12") => "0" . here "" is the initial value ("0","23") => "1" -- here above 0 is used again for the next element with in the same partition Partition 2 ("","345") ("","") => "0" -- resulting length is 0 ("0","345") => "1" -- zero is again used and min length becomes 1 Final merge: ("1","1") => "11" Example 2: val z = sc.parallelize(List("12","23","345",""),2) z.aggregate("")((x,y) => math.min(x.length, y.length).toString, (x,y) => x + y) res143: String = 10 Partition 1 ("12", "23") ("","12") => "0" . here "" is the initial value ("0","23") => "1" -- here above 0 is used again for the next element with in the same partition Partition 2 ("345","") ("","345") => "0" -- resulting length is 0 ("0","") => "0" -- min length becomes zero again. Final merge: ("1","0") => "10" Hope this helps On Tue, Jan 12, 2016 at 2:53 PM, Gokula Krishnan D <email2...@gmail.com> wrote: > Hello Prem - > > Thanks for sharing and I also found the similar example from the link > http://homepage.cs.latrobe.edu.au/zhe/ZhenHeSparkRDDAPIExamples.html#aggregate > > > But trying the understand the actual functionality or behavior. > > Thanks & Regards, > Gokula Krishnan* (Gokul)* > > On Tue, Jan 12, 2016 at 2:50 PM, Prem Sure <premsure...@gmail.com> wrote: > >> try mapPartitionsWithIndex .. below is an example I used earlier. myfunc >> logic can be further modified as per your need. >> val x = sc.parallelize(List(1,2,3,4,5,6,7,8,9), 3) >> def myfunc(index: Int, iter: Iterator[Int]) : Iterator[String] = { >> iter.toList.map(x => index + "," + x).iterator >> } >> x.mapPartitionsWithIndex(myfunc).collect() >> res10: Array[String] = Array(0,1, 0,2, 0,3, 1,4, 1,5, 1,6, 2,7, 2,8, 2,9) >> >> On Tue, Jan 12, 2016 at 2:06 PM, Gokula Krishnan D <email2...@gmail.com> >> wrote: >> >>> Hello All - >>> >>> I'm just trying to understand aggregate() and in the meantime got an >>> question. >>> >>> *Is there any way to view the RDD databased on the partition ?.* >>> >>> For the instance, the following RDD has 2 partitions >>> >>> val multi2s = List(2,4,6,8,10,12,14,16,18,20) >>> val multi2s_RDD = sc.parallelize(multi2s,2) >>> >>> is there anyway to view the data based on the partitions (0,1). >>> >>> >>> Thanks & Regards, >>> Gokula Krishnan* (Gokul)* >>> >> >> >