RE: Key-Value decomposition

2014-11-04 Thread Suraj Satishkumar Sheth
Hi David,
Use something like :
Val outputRDD = rdd.flatMap(keyValue = keyValue._2.split(;).map(value = 
(keyvalue._1, value)).toArray)

Thanks and Regards,
Suraj Sheth

-Original Message-
From: david [mailto:david...@free.fr] 
Sent: Tuesday, November 04, 2014 1:28 PM
To: u...@spark.incubator.apache.org
Subject: Re: Key-Value decomposition

Hi,

 But i've only one RDD. Hre is a more complete exemple :

my rdd is something like   (A, 1;2;3),  (B, 2;5;6), (C, 3;2;1) 

And i expect to have the following result :

 (A,1) , (A,2) , (A,3) , (B,2) , (B,5) , (B,6) , (C,3) ,
(C,2) , (C,1)


Any idea about how can i achieve this ?

Thank's



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Key-Value-decomposition-tp17966p18036.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: Actors and sparkcontext actions

2014-03-04 Thread Suraj Satishkumar Sheth
Hi Ognen,
See if this helps. I was working on this :

class MyClass[T](sc : SparkContext, flag1 : Boolean, rdd : RDD[T], hdfsPath : 
String) extends Actor {

  def act(){
if(flag1) this.process()
else this.count
  }
  
  private def process(){
println(sc.textFile(hdfsPath).count)
//do the processing
  }
  
  private def count(){
   println(rdd.count)
   //do the counting
  }

}

Thanks and Regards,
Suraj Sheth


-Original Message-
From: Ognen Duzlevski [mailto:og...@nengoiksvelzud.com] 
Sent: 27 February 2014 01:09
To: u...@spark.incubator.apache.org
Subject: Actors and sparkcontext actions

Can someone point me to a simple, short code example of creating a basic Actor 
that gets a context and runs an operation such as .textFile.count? 
I am trying to figure out how to create just a basic actor that gets a message 
like this:

case class Msg(filename:String, ctx: SparkContext)

and then something like this:

class HelloActor extends Actor {
 import context.dispatcher

 def receive = {
 case Msg(fn,ctx) = {
 // get the count here!
 // cts.textFile(fn).count
 }
 case _ = println(huh?)
 }
}

Where I would want to do something like:

val conf = new
SparkConf().setMaster(spark://192.168.10.29:7077).setAppName(Hello).setSparkHome(/Users/maketo/plainvanilla/spark-0.9)
val sc = new SparkContext(conf)
val system = ActorSystem(mySystem)

val helloActor1 = system.actorOf( Props[ HelloActor], name = helloactor1)
helloActor1 ! new Msg(test.json,sc)

Thanks,
Ognen


RE: Size of RDD larger than Size of data on disk

2014-02-25 Thread Suraj Satishkumar Sheth
Hi Mayur,
Thanks for replying. Is it usually double the size of data on disk?
I have observed this many times. Storage section of Spark is telling me that 
100% of RDD is cached using 97 GB of RAM while the data in HDFS is only 47 GB.

Thanks and Regards,
Suraj Sheth

From: Mayur Rustagi [mailto:mayur.rust...@gmail.com]
Sent: Tuesday, February 25, 2014 11:19 PM
To: user@spark.apache.org
Cc: u...@spark.incubator.apache.org
Subject: Re: Size of RDD larger than Size of data on disk

Spark may take more RAM than reqiured by RDD, can you look at storage section 
of Spark  see how much space RDD is taking in memory. It may still take more 
storage than disk as Java objects have some overhead.
Consider enabling compression in RDD.

Mayur Rustagi
Ph: +919632149971
hhttps://twitter.com/mayur_rustagittp://www.sigmoidanalytics.comhttp://www.sigmoidanalytics.com
https://twitter.com/mayur_rustagi


On Tue, Feb 25, 2014 at 6:47 AM, Suraj Satishkumar Sheth 
suraj...@adobe.commailto:suraj...@adobe.com wrote:
Hi All,
I have a folder in HDFS which has files with size of 47GB. I am loading this in 
Spark as RDD[String] and caching it. The total amount of RAM that Spark uses to 
cache it is around 97GB. I want to know why Spark is taking up so much of Space 
for the RDD? Can we reduce the RDD size in Spark and make it similar to it’s 
size on disk?

Thanks and Regards,
Suraj Sheth