Hi David,
Use something like :
Val outputRDD = rdd.flatMap(keyValue = keyValue._2.split(;).map(value =
(keyvalue._1, value)).toArray)
Thanks and Regards,
Suraj Sheth
-Original Message-
From: david [mailto:david...@free.fr]
Sent: Tuesday, November 04, 2014 1:28 PM
To:
Hi Ognen,
See if this helps. I was working on this :
class MyClass[T](sc : SparkContext, flag1 : Boolean, rdd : RDD[T], hdfsPath :
String) extends Actor {
def act(){
if(flag1) this.process()
else this.count
}
private def process(){
println(sc.textFile(hdfsPath).count)
On Tue, Feb 25, 2014 at 6:47 AM, Suraj Satishkumar Sheth
suraj...@adobe.commailto:suraj...@adobe.com wrote:
Hi All,
I have a folder in HDFS which has files with size of 47GB. I am loading this in
Spark as RDD[String] and caching it. The total amount of RAM that Spark uses to
cache it is around