try mapPartitionsWithIndex .. below is an example I used earlier. myfunc logic can be further modified as per your need. val x = sc.parallelize(List(1,2,3,4,5,6,7,8,9), 3) def myfunc(index: Int, iter: Iterator[Int]) : Iterator[String] = { iter.toList.map(x => index + "," + x).iterator } x.mapPartitionsWithIndex(myfunc).collect() res10: Array[String] = Array(0,1, 0,2, 0,3, 1,4, 1,5, 1,6, 2,7, 2,8, 2,9)
On Tue, Jan 12, 2016 at 2:06 PM, Gokula Krishnan D <email2...@gmail.com> wrote: > Hello All - > > I'm just trying to understand aggregate() and in the meantime got an > question. > > *Is there any way to view the RDD databased on the partition ?.* > > For the instance, the following RDD has 2 partitions > > val multi2s = List(2,4,6,8,10,12,14,16,18,20) > val multi2s_RDD = sc.parallelize(multi2s,2) > > is there anyway to view the data based on the partitions (0,1). > > > Thanks & Regards, > Gokula Krishnan* (Gokul)* >