Here is what I think: mapPartitions is for a specialized map that is called only once for each partition. The entire content of the respective partitions is available as a sequential stream of values via the input argument (Iterarator[T]). The combined result iterators are automatically converted into a new RDD. So in this case, the RDD (1,2,...., 10) is split as 3 partitions, (1,2,3), (4,5,6), (7,8,9,10). For every partition, your function is the get the first element as x.next, using it to build a list, return the iterator from the List. So each partition will return (1), (4) and (7) as 3 iterator, then combine to one final RDD (1, 4, 7). Yong
> Date: Wed, 18 Mar 2015 10:19:34 -0700 > From: ashish.us...@gmail.com > To: user@spark.apache.org > Subject: mapPartitions - How Does it Works > > I am trying to understand about mapPartitions but i am still not sure how it > works > > in the below example it create three partition > val parallel = sc.parallelize(1 to 10, 3) > > and when we do below > parallel.mapPartitions( x => List(x.next).iterator).collect > > it prints value > Array[Int] = Array(1, 4, 7) > > Can some one please explain why it prints 1,4,7 only > > Thanks, > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/mapPartitions-How-Does-it-Works-tp22123.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org >