GraphX is build on *top* of Spark, so Spark can achieve whatever GraphX can.
On Wed, Nov 5, 2014 at 9:41 AM, Deep Pradhan pradhandeep1...@gmail.com
wrote:
Hi,
Can Spark achieve whatever GraphX can?
Keeping aside the performance comparison between Spark and GraphX, if I
want to implement any
I don't think directy .aggregateByKey() can be done, because we will need
count of keys (for average). Maybe we can use .countByKey() which returns a
map and .foldByKey(0)(_+_) (or aggregateByKey()) which gives sum of values
per key. I myself ain't getting how to proceed.
Regards
On Fri, Oct 31,
In IntelliJ, Tools Generate Scaladoc.
Kamal
On Fri, Oct 31, 2014 at 5:35 AM, Alessandro Baretta alexbare...@gmail.com
wrote:
How do I build the scaladoc html files from the spark source distribution?
Alex Bareta
You can also use PairRDDFunctions' saveAsNewAPIHadoopFile that takes an
OutputFormat class.
So you will have to write a custom OutputFormat class that extends
OutputFormat. In this class, you will have to implement a getRecordWriter
which returns a custom RecordWriter.
So you will also have to
Hi Flavio,
Doing batch += ... shouldn't work. It will create new batch for each
element in the myRDD (also val initializes an immutable variable, var is
for mutable variables). You can use something like accumulators
http://spark.apache.org/docs/latest/programming-guide.html#accumulators.
val
/%3CCAF_KkPwk7iiQVD2JzOwVVhQ_U2p3bPVM=-bka18v4s-5-lp...@mail.gmail.com%3E
Regards
- Saurabh Wadhawan
On 20-Oct-2014, at 4:56 pm, Kamal Banga banga.ka...@gmail.com wrote:
1. All RDD operations are executed in workers. So reading a text file
or executing val x = 1 will happen on worker. (link
http://stackoverflow.com
, dissemination, distribution, copying or the
taking of any action in reliance on the information herein is prohibited.
-- Forwarded message --
From: Kamal Banga ka...@sigmoidanalytics.com
Date: Mon, Oct 20, 2014 at 4:20 PM
Subject: Re: Spark Concepts
To: nsar...@gmail.com
Cc
1. All RDD operations are executed in workers. So reading a text file or
executing val x = 1 will happen on worker. (link
http://stackoverflow.com/questions/24637312/spark-driver-in-apache-spark)
2.
a. Without braodcast: Let's say you have 'n' nodes. You can set hadoop's
replication factor to n
Hi All,
The function *mapPartitions *in RDD.scala
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala
takes
a boolean parameter *preservesPartitioning. *It seems if that parameter is
passed as *false*, the passed function f will operate on the data only