I am not an expert but some thoughts inline....

On Jun 16, 2016 6:31 AM, "Maja Kabiljo" <majakabi...@fb.com> wrote:
>
> Hi,
>
> We are running some experiments with GraphX in order to compare it with
other systems. There are multiple settings which significantly affect
performance, and we experimented a lot in order to tune them well. I’ll
share here what are the best we found so far and which results we got with
them, and would really appreciate if anyone who used GraphX before has any
advice on what else can make it even better, or confirm that these results
are as good as it gets.
>
> Algorithms we used are pagerank and connected components. We used Twitter
and UK graphs from the GraphX paper (
https://amplab.cs.berkeley.edu/wp-content/uploads/2014/09/graphx.pdf), and
also generated graphs with properties similar to Facebook social graph with
various number of edges. Apart from performance we tried to see what is the
minimum amount of resources it requires in order to handle graph of some
size.
>
> We ran experiments using Spark 1.6.1, on machines which have 20 cores
with 2-way SMT, always fixing number of executors (min=max=initial), giving
40GB or 80GB per executor, and making sure we run only a single executor
per machine.

*******Deepak*******
I guess you have 16 machines in your test. Is that right?
******Deepak*******

Additionally we used:
> spark.shuffle.manager=hash, spark.shuffle.service.enabled=false
> Parallel GC
> PartitionStrategy.EdgePartition2D
> 8*numberOfExecutors partitions
> Here are some data points which we got:
> Running on Facebook-like graph with 2 billion edges, using 4 executors
with 80GB each it took 451 seconds to do 20 iterations of pagerank and 236
seconds to find connected components. It failed when we tried to use 2
executors, or 4 executors with 40GB each.
> For graph with 10 billion edges we needed 16 executors with 80GB each (it
failed with 8), 1041 seconds for 20 iterations of pagerank and 716 seconds
for connected component

******Deepak*****
The executors are not scaling linearly. You should need max of 10
executors. Also what is the error it is showing for 8 executors?
*****Deepak******

> Twitter-2010 graph (1.5 billion edges), 8 executors, 40GB each, pagerank
473s, connected components 264s. With 4 executors 80GB each it worked but
was struggling (pr 2475s, cc 4499s), with 8 executors 80GB pr 362s, cc 255s.

*****Deepak*****
For 4 executors can you try with 160GB. Also if you could spell out the
system statistics during the test it would be great. My guess is with 4
connectors a lot of spilling is happening
*****Deepak***

> One more thing, we were not able to reproduce what’s mentioned in the
paper about fault tolerance (section 5.2). If we kill an executor during
first few iterations it recovers successfully, but if killed in later
iterations reconstruction of each iteration starts taking exponentially
longer and doesn’t finish after letting it run for a few hours. Are there
some additional parameters which we need to set in order for this to work?
>
> Any feedback would be highly appreciated!
>
> Thank you,
> Maja

Reply via email to