Re: unable to do group by with 1st column

2014-12-28 Thread Michael Albert
: Re: unable to do group by with 1st column Here is a sketch of what you need to do off the top of my head and based on a guess of what your RDD is like:val in: RDD[(K,Seq[(C,V)])] = ...in.flatMap { case (key, colVals) =   colVals.map { case (col, val) =     (col, (key, val))   } }.groupByKeySo

Re: unable to do group by with 1st column

2014-12-28 Thread Sean Owen
5g? Anyway, thanks for the info. Best wishes, Mike From: Sean Owen so...@cloudera.com To: Michael Albert m_albert...@yahoo.com Cc: user@spark.apache.org Sent: Friday, December 26, 2014 3:23 PM Subject: Re: unable to do group by with 1st column Here

RE: unable to do group by with 1st column

2014-12-26 Thread Sean Owen
:* Tobias Pfeiffer [mailto:t...@preferred.jp] *Sent:* Friday, December 26, 2014 6:35 AM *To:* Amit Behera *Cc:* u...@spark.incubator.apache.org *Subject:* Re: unable to do group by with 1st column Hi, On Fri, Dec 26, 2014 at 5:22 AM, Amit Behera amit.bd...@gmail.com wrote: How can I do

Re: unable to do group by with 1st column

2014-12-26 Thread Amit Behera
...@spark.incubator.apache.org *Subject:* Re: unable to do group by with 1st column Hi, On Fri, Dec 26, 2014 at 5:22 AM, Amit Behera amit.bd...@gmail.com wrote: How can I do it? Please help me to do. Have you considered using groupByKey? http://spark.apache.org/docs/latest

Re: unable to do group by with 1st column

2014-12-26 Thread Michael Albert
there resubmitting the shuffle phase. Happy holidays, all!-Mike  From: Amit Behera amit.bd...@gmail.com To: u...@spark.incubator.apache.org Sent: Thursday, December 25, 2014 3:22 PM Subject: unable to do group by with 1st column Hi Users, I am reading a csv file and my data format is like

Re: unable to do group by with 1st column

2014-12-26 Thread Sean Owen
*Sent:* Thursday, December 25, 2014 3:22 PM *Subject:* unable to do group by with 1st column Hi Users, I am reading a csv file and my data format is like : key1,value1 key1,value2 key1,value1 key1,value3 key2,value1 key2,value5 key2,value5 key2,value4 key1,value4 key1,value4 key3

unable to do group by with 1st column

2014-12-25 Thread Amit Behera
Hi Users, I am reading a csv file and my data format is like : key1,value1 key1,value2 key1,value1 key1,value3 key2,value1 key2,value5 key2,value5 key2,value4 key1,value4 key1,value4 key3,value1 key3,value1 key3,value2 required output : key1:[value1,value2,value1,value3,value4,value4]

Re: unable to do group by with 1st column

2014-12-25 Thread Tobias Pfeiffer
Hi, On Fri, Dec 26, 2014 at 5:22 AM, Amit Behera amit.bd...@gmail.com wrote: How can I do it? Please help me to do. Have you considered using groupByKey? http://spark.apache.org/docs/latest/programming-guide.html#transformations Tobias

RE: unable to do group by with 1st column

2014-12-25 Thread Somnath Pandeya
; } }); From: Tobias Pfeiffer [mailto:t...@preferred.jp] Sent: Friday, December 26, 2014 6:35 AM To: Amit Behera Cc: u...@spark.incubator.apache.org Subject: Re: unable to do group by with 1st column Hi, On Fri, Dec 26, 2014 at 5:22 AM, Amit Behera amit.bd...@gmail.commailto:amit.bd