RE: Key-Value decomposition

2014-11-04 Thread Suraj Satishkumar Sheth
Hi David, Use something like : Val outputRDD = rdd.flatMap(keyValue = keyValue._2.split(;).map(value = (keyvalue._1, value)).toArray) Thanks and Regards, Suraj Sheth -Original Message- From: david [mailto:david...@free.fr] Sent: Tuesday, November 04, 2014 1:28 PM To:

RE: Actors and sparkcontext actions

2014-03-04 Thread Suraj Satishkumar Sheth
Hi Ognen, See if this helps. I was working on this : class MyClass[T](sc : SparkContext, flag1 : Boolean, rdd : RDD[T], hdfsPath : String) extends Actor { def act(){ if(flag1) this.process() else this.count } private def process(){ println(sc.textFile(hdfsPath).count)

RE: Size of RDD larger than Size of data on disk

2014-02-25 Thread Suraj Satishkumar Sheth
On Tue, Feb 25, 2014 at 6:47 AM, Suraj Satishkumar Sheth suraj...@adobe.commailto:suraj...@adobe.com wrote: Hi All, I have a folder in HDFS which has files with size of 47GB. I am loading this in Spark as RDD[String] and caching it. The total amount of RAM that Spark uses to cache it is around