Re: GraphX and Spark

2014-11-04 Thread Kamal Banga
GraphX is build on *top* of Spark, so Spark can achieve whatever GraphX can. On Wed, Nov 5, 2014 at 9:41 AM, Deep Pradhan pradhandeep1...@gmail.com wrote: Hi, Can Spark achieve whatever GraphX can? Keeping aside the performance comparison between Spark and GraphX, if I want to implement any

Re: about aggregateByKey and standard deviation

2014-11-03 Thread Kamal Banga
I don't think directy .aggregateByKey() can be done, because we will need count of keys (for average). Maybe we can use .countByKey() which returns a map and .foldByKey(0)(_+_) (or aggregateByKey()) which gives sum of values per key. I myself ain't getting how to proceed. Regards On Fri, Oct 31,

Re: Scaladoc

2014-10-31 Thread Kamal Banga
In IntelliJ, Tools Generate Scaladoc. Kamal On Fri, Oct 31, 2014 at 5:35 AM, Alessandro Baretta alexbare...@gmail.com wrote: How do I build the scaladoc html files from the spark source distribution? Alex Bareta

Re: Using a Database to persist and load data from

2014-10-31 Thread Kamal Banga
You can also use PairRDDFunctions' saveAsNewAPIHadoopFile that takes an OutputFormat class. So you will have to write a custom OutputFormat class that extends OutputFormat. In this class, you will have to implement a getRecordWriter which returns a custom RecordWriter. So you will also have to

Re: Batch of updates

2014-10-28 Thread Kamal Banga
Hi Flavio, Doing batch += ... shouldn't work. It will create new batch for each element in the myRDD (also val initializes an immutable variable, var is for mutable variables). You can use something like accumulators http://spark.apache.org/docs/latest/programming-guide.html#accumulators. val

Re: What executes on worker and what executes on driver side

2014-10-28 Thread Kamal Banga
/%3CCAF_KkPwk7iiQVD2JzOwVVhQ_U2p3bPVM=-bka18v4s-5-lp...@mail.gmail.com%3E Regards - Saurabh Wadhawan On 20-Oct-2014, at 4:56 pm, Kamal Banga banga.ka...@gmail.com wrote: 1. All RDD operations are executed in workers. So reading a text file or executing val x = 1 will happen on worker. (link http://stackoverflow.com

Re: Spark Concepts

2014-10-20 Thread Kamal Banga
, dissemination, distribution, copying or the taking of any action in reliance on the information herein is prohibited. -- Forwarded message -- From: Kamal Banga ka...@sigmoidanalytics.com Date: Mon, Oct 20, 2014 at 4:20 PM Subject: Re: Spark Concepts To: nsar...@gmail.com Cc

Re: What executes on worker and what executes on driver side

2014-10-20 Thread Kamal Banga
1. All RDD operations are executed in workers. So reading a text file or executing val x = 1 will happen on worker. (link http://stackoverflow.com/questions/24637312/spark-driver-in-apache-spark) 2. a. Without braodcast: Let's say you have 'n' nodes. You can set hadoop's replication factor to n

preservesPartitioning

2014-07-17 Thread Kamal Banga
Hi All, The function *mapPartitions *in RDD.scala https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala takes a boolean parameter *preservesPartitioning. *It seems if that parameter is passed as *false*, the passed function f will operate on the data only