CombineByKey - Please explain its working

2015-03-24 Thread ashish.usoni
I am reading about combinebyKey and going through below example from one of
the blog post but i cant understand how it works step by step , Can some one
please explain 


Case  class  Fruit ( kind :  String ,  weight :  Int )  { 
def  makeJuice : Juice  =  Juice ( weight  *  100 ) 
} 
Case  class  Juice ( volumn :  Int )  { 
def  add ( J :  Juice ) : Juice  =  Juice ( volumn  +  J . volumn ) 
} 
Val  apple1  =  Fruit ( Apple ,  5 ) 
Val  Apple2  =  Fruit ( Apple ,  8 ) 
Val  orange1  =  Fruit ( orange ,  10 )

Val  Fruit  =  sc . Parallelize ( List (( Apple ,  apple1 )  ,  ( orange
,  orange1 )  ,  ( Apple ,  Apple2 )))  
*Val  Juice  =  Fruit . combineByKey ( 
f  =  f . makeJuice , 
( J : Juice , f )  =  J . add ( f . makeJuice ), 
( J1 : Juice , J2 : Juice )  =  J1 . add ( J2 )  
)*



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/CombineByKey-Please-explain-its-working-tp22203.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



How Does aggregate work

2015-03-22 Thread ashish.usoni
Hi , 
I am not able to understand how aggregate function works, Can some one
please explain how below result came 
I am running spark using cloudera VM 

The result in below is 17 but i am not able to find out how it is
calculating 17
val data = sc.parallelize(List(2,3,4))
data.aggregate(0)((x,y) = x+y,(x,y) = 2+x+y)
*res21: Int = 17*

Also when i try to change the 2nd parameter in sc.parallelize i get
different result 

val data = sc.parallelize(List(2,3,4),2)
data.aggregate(0)((x,y) = x+y,(x,y) = 2+x+y)
*res21: Int = 13*

Thanks for the help.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-Does-aggregate-work-tp22179.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



mapPartitions - How Does it Works

2015-03-18 Thread ashish.usoni
I am trying to understand about mapPartitions but i am still not sure how it
works

in the below example it create three partition 
val parallel = sc.parallelize(1 to 10, 3)

and when we do below 
parallel.mapPartitions( x = List(x.next).iterator).collect

it prints value 
Array[Int] = Array(1, 4, 7)

Can some one please explain why it prints 1,4,7 only

Thanks,




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/mapPartitions-How-Does-it-Works-tp22123.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org