Re: How to view the RDD data based on Partition

Prem Sure Tue, 12 Jan 2016 12:15:06 -0800

I had explored these examples couple of months back. very good link for RDD
operations. see if below explanation helps, try to understand the
difference between below 2 examples.. initial value in both is """
Example 1;
val z = sc.parallelize(List("12","23","","345"),2)
z.aggregate("")((x,y) => math.min(x.length, y.length).toString, (x,y) => x
+ y)
res144: String = 11


Partition 1 ("12", "23")
("","12") => "0" . here  "" is the initial value
("0","23") => "1" -- here above 0 is used again for the next element with
in the same partition

Partition 2 ("","345")
("","") => "0"   -- resulting length is 0
("0","345") => "1"  -- zero is again used and min length becomes 1

Final merge:
("1","1") => "11"

Example 2:
val z = sc.parallelize(List("12","23","345",""),2)
z.aggregate("")((x,y) => math.min(x.length, y.length).toString, (x,y) => x
+ y)
res143: String = 10

Partition 1 ("12", "23")
("","12") => "0" . here  "" is the initial value
("0","23") => "1" -- here above 0 is used again for the next element with
in the same partition

Partition 2 ("345","")
("","345") => "0"   -- resulting length is 0
("0","") => "0"  -- min length becomes zero again.

Final merge:
("1","0") => "10"

Hope this helps


On Tue, Jan 12, 2016 at 2:53 PM, Gokula Krishnan D <email2...@gmail.com>
wrote:

> Hello Prem -
>
> Thanks for sharing and I also found the similar example from the link
> http://homepage.cs.latrobe.edu.au/zhe/ZhenHeSparkRDDAPIExamples.html#aggregate
>
>
> But trying the understand the actual functionality or behavior.
>
> Thanks & Regards,
> Gokula Krishnan* (Gokul)*
>
> On Tue, Jan 12, 2016 at 2:50 PM, Prem Sure <premsure...@gmail.com> wrote:
>
>>  try mapPartitionsWithIndex .. below is an example I used earlier. myfunc
>> logic can be further modified as per your need.
>> val x = sc.parallelize(List(1,2,3,4,5,6,7,8,9), 3)
>> def myfunc(index: Int, iter: Iterator[Int]) : Iterator[String] = {
>>   iter.toList.map(x => index + "," + x).iterator
>> }
>> x.mapPartitionsWithIndex(myfunc).collect()
>> res10: Array[String] = Array(0,1, 0,2, 0,3, 1,4, 1,5, 1,6, 2,7, 2,8, 2,9)
>>
>> On Tue, Jan 12, 2016 at 2:06 PM, Gokula Krishnan D <email2...@gmail.com>
>> wrote:
>>
>>> Hello All -
>>>
>>> I'm just trying to understand aggregate() and in the meantime got an
>>> question.
>>>
>>> *Is there any way to view the RDD databased on the partition ?.*
>>>
>>> For the instance, the following RDD has 2 partitions
>>>
>>> val multi2s = List(2,4,6,8,10,12,14,16,18,20)
>>> val multi2s_RDD = sc.parallelize(multi2s,2)
>>>
>>> is there anyway to view the data based on the partitions (0,1).
>>>
>>>
>>> Thanks & Regards,
>>> Gokula Krishnan* (Gokul)*
>>>
>>
>>
>

Re: How to view the RDD data based on Partition

Reply via email to