Unlike a map() wherein your task is acting on a row at a time, with mapPartitions(), the task is passed the entire content of the partition in an iterator. You can then return back another iterator as the output. I don't do scala, but from what I understand from your code snippet... The iterator x can return all the rows in the partition. But you are returning back after consuming the first row. Hence you see only 1,4,7 in your output. These are the first rows of each of your 3 partitions.
Regards Sab On 18-Mar-2015 10:50 pm, "ashish.usoni" <ashish.us...@gmail.com> wrote: > I am trying to understand about mapPartitions but i am still not sure how > it > works > > in the below example it create three partition > val parallel = sc.parallelize(1 to 10, 3) > > and when we do below > parallel.mapPartitions( x => List(x.next).iterator).collect > > it prints value > Array[Int] = Array(1, 4, 7) > > Can some one please explain why it prints 1,4,7 only > > Thanks, > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/mapPartitions-How-Does-it-Works-tp22123.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >