: Re: unable to do group by with 1st column
Here is a sketch of what you need to do off the top of my head and based on a
guess of what your RDD is like:val in: RDD[(K,Seq[(C,V)])] = ...in.flatMap {
case (key, colVals) =
colVals.map { case (col, val) =
(col, (key, val))
}
}.groupByKeySo
5g?
Anyway, thanks for the info.
Best wishes,
Mike
From: Sean Owen so...@cloudera.com
To: Michael Albert m_albert...@yahoo.com
Cc: user@spark.apache.org
Sent: Friday, December 26, 2014 3:23 PM
Subject: Re: unable to do group by with 1st column
Here
:* Tobias Pfeiffer [mailto:t...@preferred.jp]
*Sent:* Friday, December 26, 2014 6:35 AM
*To:* Amit Behera
*Cc:* u...@spark.incubator.apache.org
*Subject:* Re: unable to do group by with 1st column
Hi,
On Fri, Dec 26, 2014 at 5:22 AM, Amit Behera amit.bd...@gmail.com wrote:
How can I do
...@spark.incubator.apache.org
*Subject:* Re: unable to do group by with 1st column
Hi,
On Fri, Dec 26, 2014 at 5:22 AM, Amit Behera amit.bd...@gmail.com
wrote:
How can I do it? Please help me to do.
Have you considered using groupByKey?
http://spark.apache.org/docs/latest
there resubmitting the shuffle phase.
Happy holidays, all!-Mike
From: Amit Behera amit.bd...@gmail.com
To: u...@spark.incubator.apache.org
Sent: Thursday, December 25, 2014 3:22 PM
Subject: unable to do group by with 1st column
Hi Users,
I am reading a csv file and my data format is like
*Sent:* Thursday, December 25, 2014 3:22 PM
*Subject:* unable to do group by with 1st column
Hi Users,
I am reading a csv file and my data format is like :
key1,value1
key1,value2
key1,value1
key1,value3
key2,value1
key2,value5
key2,value5
key2,value4
key1,value4
key1,value4
key3
Hi Users,
I am reading a csv file and my data format is like :
key1,value1
key1,value2
key1,value1
key1,value3
key2,value1
key2,value5
key2,value5
key2,value4
key1,value4
key1,value4
key3,value1
key3,value1
key3,value2
required output :
key1:[value1,value2,value1,value3,value4,value4]
Hi,
On Fri, Dec 26, 2014 at 5:22 AM, Amit Behera amit.bd...@gmail.com wrote:
How can I do it? Please help me to do.
Have you considered using groupByKey?
http://spark.apache.org/docs/latest/programming-guide.html#transformations
Tobias
;
}
});
From: Tobias Pfeiffer [mailto:t...@preferred.jp]
Sent: Friday, December 26, 2014 6:35 AM
To: Amit Behera
Cc: u...@spark.incubator.apache.org
Subject: Re: unable to do group by with 1st column
Hi,
On Fri, Dec 26, 2014 at 5:22 AM, Amit Behera
amit.bd...@gmail.commailto:amit.bd