Re: How to view the RDD data based on Partition

Gokula Krishnan D Tue, 12 Jan 2016 11:54:11 -0800

Hello Prem -

Thanks for sharing and I also found the similar example from the link
http://homepage.cs.latrobe.edu.au/zhe/ZhenHeSparkRDDAPIExamples.html#aggregate



But trying the understand the actual functionality or behavior.

Thanks & Regards,
Gokula Krishnan* (Gokul)*

On Tue, Jan 12, 2016 at 2:50 PM, Prem Sure <premsure...@gmail.com> wrote:

>  try mapPartitionsWithIndex .. below is an example I used earlier. myfunc
> logic can be further modified as per your need.
> val x = sc.parallelize(List(1,2,3,4,5,6,7,8,9), 3)
> def myfunc(index: Int, iter: Iterator[Int]) : Iterator[String] = {
>   iter.toList.map(x => index + "," + x).iterator
> }
> x.mapPartitionsWithIndex(myfunc).collect()
> res10: Array[String] = Array(0,1, 0,2, 0,3, 1,4, 1,5, 1,6, 2,7, 2,8, 2,9)
>
> On Tue, Jan 12, 2016 at 2:06 PM, Gokula Krishnan D <email2...@gmail.com>
> wrote:
>
>> Hello All -
>>
>> I'm just trying to understand aggregate() and in the meantime got an
>> question.
>>
>> *Is there any way to view the RDD databased on the partition ?.*
>>
>> For the instance, the following RDD has 2 partitions
>>
>> val multi2s = List(2,4,6,8,10,12,14,16,18,20)
>> val multi2s_RDD = sc.parallelize(multi2s,2)
>>
>> is there anyway to view the data based on the partitions (0,1).
>>
>>
>> Thanks & Regards,
>> Gokula Krishnan* (Gokul)*
>>
>
>

Re: How to view the RDD data based on Partition

Reply via email to