subject:"\[GraphX\] How spark parameters relate to Pregel implementation"

Re: [GraphX] How spark parameters relate to Pregel implementation

2014-08-04 Thread Ankur Dave

At 2014-08-04 20:52:26 +0800, Bin  wrote:
> I wonder how spark parameters, e.g., number of paralellism, affect Pregel 
> performance? Specifically, sendmessage, mergemessage, and vertexprogram?
>
> I have tried label propagation on a 300,000 edges graph, and I found that no 
> paralellism is much faster than 5 or 500 paralellism.

Increasing the level of parallelism will increase storage overhead (because 
each vertex will need to be replicated to more edge partitions to form the 
triplets) and will also increase communication. Unless there's something to be 
gained from higher parallelism, this will worsen performance. Additionally, 
going from no parallelism to some parallelism will incur the extra cost of task 
communication via shuffles.

Parallelism has two benefits: it allows edge scans and aggregations to proceed 
in parallel, and it enables the graph to be stored across many machines.

For small graphs, the slight performance gain due to parallelism is vastly 
outweighed by the cost of inter-process communication and shuffling to disk, 
and distributed storage is not necessary since the graph fits on a single 
machine. There are single-machine graph processing systems such as X-Stream [1] 
and GraphChi [2] that optimize performance for these kinds of graphs.

However, parallelism becomes necessary for larger graphs with hundreds of 
millions of edges or large amounts of associated vertex and edge data. GraphX 
is designed for this scale of data.

Ankur

[1] http://infoscience.epfl.ch/record/188535/files/paper.pdf
[2] http://graphlab.org/projects/graphchi.html

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

[GraphX] How spark parameters relate to Pregel implementation

2014-08-04 Thread Bin

Hi all,


I wonder how spark parameters, e.g., number of paralellism, affect Pregel 
performance? Specifically, sendmessage, mergemessage, and vertexprogram?


I have tried label propagation on a 300,000 edges graph, and I found that no 
paralellism is much faster than 5 or 500 paralellism.


Looking for advice!


Thanks a lot!


Best,
Bin

Re: [GraphX] How spark parameters relate to Pregel implementation

[GraphX] How spark parameters relate to Pregel implementation

2 matches

Site Navigation

Mail list logo

Footer information