mapPartitions - How Does it Works

2015-03-18 Thread ashish.usoni
I am trying to understand about mapPartitions but i am still not sure how it
works

in the below example it create three partition 
val parallel = sc.parallelize(1 to 10, 3)

and when we do below 
parallel.mapPartitions( x = List(x.next).iterator).collect

it prints value 
Array[Int] = Array(1, 4, 7)

Can some one please explain why it prints 1,4,7 only

Thanks,




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/mapPartitions-How-Does-it-Works-tp22123.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: mapPartitions - How Does it Works

2015-03-18 Thread java8964
Here is what I think:
mapPartitions is for a specialized map that is called only once for each 
partition. The entire content of the respective partitions is available as a 
sequential stream of values via the input argument (Iterarator[T]). The 
combined result iterators are automatically converted into a new RDD.
So in this case, the RDD (1,2,, 10) is split as 3 partitions, (1,2,3), 
(4,5,6), (7,8,9,10).
For every partition, your function is the get the first element as x.next, 
using it to build a list, return the iterator from the List.
So each partition will return (1), (4) and (7) as 3 iterator, then combine to 
one final RDD (1, 4, 7).
Yong

 Date: Wed, 18 Mar 2015 10:19:34 -0700
 From: ashish.us...@gmail.com
 To: user@spark.apache.org
 Subject: mapPartitions - How Does it Works
 
 I am trying to understand about mapPartitions but i am still not sure how it
 works
 
 in the below example it create three partition 
 val parallel = sc.parallelize(1 to 10, 3)
 
 and when we do below 
 parallel.mapPartitions( x = List(x.next).iterator).collect
 
 it prints value 
 Array[Int] = Array(1, 4, 7)
 
 Can some one please explain why it prints 1,4,7 only
 
 Thanks,
 
 
 
 
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/mapPartitions-How-Does-it-Works-tp22123.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 
  

Re: mapPartitions - How Does it Works

2015-03-18 Thread Ganelin, Ilya
Map partitions works as follows :
1) For each partition of your RDD, it provides an iterator over the values
within that partition
2) You then define a function that operates on that iterator

Thus if you do the following:

val parallel = sc.parallelize(1 to 10, 3)

parallel.mapPartitions( x = x.map(s = s + 1)).collect



You would get:
res3: Array[Int] = Array(2, 3, 4, 5, 6, 7, 8, 9, 10, 11)


In your example, x is not a pointer that traverses the iterator (e.g. With
.next()) , it¹s literally the Iterable object itself.
On 3/18/15, 10:19 AM, ashish.usoni ashish.us...@gmail.com wrote:

I am trying to understand about mapPartitions but i am still not sure how
it
works

in the below example it create three partition
val parallel = sc.parallelize(1 to 10, 3)

and when we do below
parallel.mapPartitions( x = List(x.next).iterator).collect

it prints value 
Array[Int] = Array(1, 4, 7)

Can some one please explain why it prints 1,4,7 only

Thanks,




--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/mapPartitions-How-Does
-it-Works-tp22123.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org




The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed.  If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: mapPartitions - How Does it Works

2015-03-18 Thread Alex Turner (TMS)
List(x.next).iterator is giving you the first element from each partition,
which would be 1, 4 and 7 respectively.

On 3/18/15, 10:19 AM, ashish.usoni ashish.us...@gmail.com wrote:

I am trying to understand about mapPartitions but i am still not sure how
it
works

in the below example it create three partition
val parallel = sc.parallelize(1 to 10, 3)

and when we do below
parallel.mapPartitions( x = List(x.next).iterator).collect

it prints value 
Array[Int] = Array(1, 4, 7)

Can some one please explain why it prints 1,4,7 only

Thanks,




--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/mapPartitions-How-Does
-it-Works-tp22123.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: mapPartitions - How Does it Works

2015-03-18 Thread Sabarish Sasidharan
Unlike a map() wherein your task is acting on a row at a time, with
mapPartitions(), the task is passed the  entire content of the partition in
an iterator. You can then return back another iterator as the output. I
don't do scala, but from what I understand from your code snippet... The
iterator x can return all the rows in the partition. But you are returning
back after consuming the first row. Hence you see only 1,4,7 in your
output. These are the first rows of each of your 3 partitions.

Regards
Sab
On 18-Mar-2015 10:50 pm, ashish.usoni ashish.us...@gmail.com wrote:

 I am trying to understand about mapPartitions but i am still not sure how
 it
 works

 in the below example it create three partition
 val parallel = sc.parallelize(1 to 10, 3)

 and when we do below
 parallel.mapPartitions( x = List(x.next).iterator).collect

 it prints value
 Array[Int] = Array(1, 4, 7)

 Can some one please explain why it prints 1,4,7 only

 Thanks,




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/mapPartitions-How-Does-it-Works-tp22123.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org