RE: mapPartitions - How Does it Works

java8964 Wed, 18 Mar 2015 10:40:45 -0700

Here is what I think:
mapPartitions is for a specialized map that is called only once for each 
partition. The entire content of the respective partitions is available as a 
sequential stream of values via the input argument (Iterarator[T]). The 
combined result iterators are automatically converted into a new RDD.
So in this case, the RDD (1,2,...., 10) is split as 3 partitions, (1,2,3), 
(4,5,6), (7,8,9,10).
For every partition, your function is the get the first element as x.next, 
using it to build a list, return the iterator from the List.
So each partition will return (1), (4) and (7) as 3 iterator, then combine to 
one final RDD (1, 4, 7).
Yong


> Date: Wed, 18 Mar 2015 10:19:34 -0700
> From: ashish.us...@gmail.com
> To: user@spark.apache.org
> Subject: mapPartitions - How Does it Works
> 
> I am trying to understand about mapPartitions but i am still not sure how it
> works
> 
> in the below example it create three partition 
> val parallel = sc.parallelize(1 to 10, 3)
> 
> and when we do below 
> parallel.mapPartitions( x => List(x.next).iterator).collect
> 
> it prints value 
> Array[Int] = Array(1, 4, 7)
> 
> Can some one please explain why it prints 1,4,7 only
> 
> Thanks,
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/mapPartitions-How-Does-it-Works-tp22123.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

RE: mapPartitions - How Does it Works

Reply via email to