Re: how to specify columns in groupby

2014-08-29 Thread MEETHU MATHEW
Thank you Yanbo for the reply..

I 've another query related to  cogroup.I want to iterate over the results of 
cogroup operation.

My code is 
* grp = RDD1.cogroup(RDD2)
* map((lambda (x,y): (x,list(y[0]),list(y[1]))), list(grp))
My result looks like :

[((u'764', u'20140826'), [0.70146274566650391], [ ]),
 ((u'863', u'20140826'), [0.368011474609375], [ ]),
 ((u'9571520', u'20140826'), [0.0046129226684570312], [0.60009])]
 
When I do one more cogroup operation like 

grp1 = grp.cogroup(RDD3)

I am not able to see the results.All my RDDs are of the form ((x,y),z).Can 
somebody help me to solve this.

Thanks  Regards, 
Meethu M


On Thursday, 28 August 2014 5:59 PM, Yanbo Liang yanboha...@gmail.com wrote:
 


For your reference:

val d1 = textFile.map(line = {
  val fileds = line.split(,)
  ((fileds(0),fileds(1)), fileds(2).toDouble)
})

val d2 = d1.reduceByKey(_+_)
d2.foreach(println)




2014-08-28 20:04 GMT+08:00 MEETHU MATHEW meethu2...@yahoo.co.in:

Hi all,


I have an RDD  which has values in the  format id,date,cost.


I want to group the elements based on the id and date columns and get the sum 
of the cost  for each group.


Can somebody tell me how to do this?


 
Thanks  Regards, 
Meethu M

how to specify columns in groupby

2014-08-28 Thread MEETHU MATHEW
Hi all,

I have an RDD  which has values in the  format id,date,cost.

I want to group the elements based on the id and date columns and get the sum 
of the cost  for each group.

Can somebody tell me how to do this?

 
Thanks  Regards, 
Meethu M

Re: how to specify columns in groupby

2014-08-28 Thread Yanbo Liang
For your reference:

val d1 = textFile.map(line = {
  val fileds = line.split(,)
  ((fileds(0),fileds(1)), fileds(2).toDouble)
})

val d2 = d1.reduceByKey(_+_)
d2.foreach(println)


2014-08-28 20:04 GMT+08:00 MEETHU MATHEW meethu2...@yahoo.co.in:

 Hi all,

 I have an RDD  which has values in the  format id,date,cost.

 I want to group the elements based on the id and date columns and get the
 sum of the cost  for each group.

 Can somebody tell me how to do this?


 Thanks  Regards,
 Meethu M