Re: GraphX and Spark

2014-11-04 Thread Kamal Banga
GraphX is build on *top* of Spark, so Spark can achieve whatever GraphX can.

On Wed, Nov 5, 2014 at 9:41 AM, Deep Pradhan pradhandeep1...@gmail.com
wrote:

 Hi,
 Can Spark achieve whatever GraphX can?
 Keeping aside the performance comparison between Spark and GraphX, if I
 want to implement any graph algorithm and I do not want to use GraphX, can
 I get the work done with Spark?

 Than You



Re: [GraphX] How spark parameters relate to Pregel implementation

2014-08-04 Thread Ankur Dave
At 2014-08-04 20:52:26 +0800, Bin wubin_phi...@126.com wrote:
 I wonder how spark parameters, e.g., number of paralellism, affect Pregel 
 performance? Specifically, sendmessage, mergemessage, and vertexprogram?

 I have tried label propagation on a 300,000 edges graph, and I found that no 
 paralellism is much faster than 5 or 500 paralellism.

Increasing the level of parallelism will increase storage overhead (because 
each vertex will need to be replicated to more edge partitions to form the 
triplets) and will also increase communication. Unless there's something to be 
gained from higher parallelism, this will worsen performance. Additionally, 
going from no parallelism to some parallelism will incur the extra cost of task 
communication via shuffles.

Parallelism has two benefits: it allows edge scans and aggregations to proceed 
in parallel, and it enables the graph to be stored across many machines.

For small graphs, the slight performance gain due to parallelism is vastly 
outweighed by the cost of inter-process communication and shuffling to disk, 
and distributed storage is not necessary since the graph fits on a single 
machine. There are single-machine graph processing systems such as X-Stream [1] 
and GraphChi [2] that optimize performance for these kinds of graphs.

However, parallelism becomes necessary for larger graphs with hundreds of 
millions of edges or large amounts of associated vertex and edge data. GraphX 
is designed for this scale of data.

Ankur

[1] http://infoscience.epfl.ch/record/188535/files/paper.pdf
[2] http://graphlab.org/projects/graphchi.html

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org