Re: How to view the RDD data based on Partition

2016-01-12 Thread Prem Sure
try mapPartitionsWithIndex .. below is an example I used earlier. myfunc logic can be further modified as per your need. val x = sc.parallelize(List(1,2,3,4,5,6,7,8,9), 3) def myfunc(index: Int, iter: Iterator[Int]) : Iterator[String] = { iter.toList.map(x => index + "," + x).iterator }

How to view the RDD data based on Partition

2016-01-12 Thread Gokula Krishnan D
Hello All - I'm just trying to understand aggregate() and in the meantime got an question. *Is there any way to view the RDD databased on the partition ?.* For the instance, the following RDD has 2 partitions val multi2s = List(2,4,6,8,10,12,14,16,18,20) val multi2s_RDD =

Re: How to view the RDD data based on Partition

2016-01-12 Thread Gokula Krishnan D
Hello Prem - Thanks for sharing and I also found the similar example from the link http://homepage.cs.latrobe.edu.au/zhe/ZhenHeSparkRDDAPIExamples.html#aggregate But trying the understand the actual functionality or behavior. Thanks & Regards, Gokula Krishnan* (Gokul)* On Tue, Jan 12, 2016 at

Re: How to view the RDD data based on Partition

2016-01-12 Thread Prem Sure
I had explored these examples couple of months back. very good link for RDD operations. see if below explanation helps, try to understand the difference between below 2 examples.. initial value in both is """ Example 1; val z = sc.parallelize(List("12","23","","345"),2) z.aggregate("")((x,y) =>