Hi all,

I have been testing GraphX on the soc-LiveJournal1 network from the SNAP
repository. Currently I am running on c3.8xlarge EC2 instances on Amazon.
These instances have 32 cores and 60GB RAM per node, and so far I have run
SSSP, PageRank, and WCC on a 1, 4, and 8 node cluster.

The issues I am having, which are present for all three algorithms, is that
(1) GraphX is not improving between 4 and 8 nodes and (2) GraphX seems to be
heavily unbalanced with some machines doing the majority of the computation.

PageRank (20 iterations) is the worst. For 1-node, 4-node, an 8-node
clusters I get the following runtimes (wallclock): 192s, 154s, and 154s.
This results is potentially understandable, though the times are
significantly worse than the results in the paper
https://amplab.cs.berkeley.edu/wp-content/uploads/2014/02/graphx.pdf, where
this algorithm ran in ~75s on a worse cluster.

My main concern is that the computation seems to be heavily unbalanced. I
have measured the CPU time of all the process associated with GraphX during
its execution and for a 4-node cluster it yielded the following CPU times
(for each machine): 724s, 697s, 2216s, 694s.

Is this normal? Should I expect a more even distribution of work across
machines?

I am using the stock pagerank code found here:
https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala.
I use the configurations "spark.executor.memory=40g" and
"spark.cores.max=128" for the 4-node case. I also set the number of edge
partitions to be 64.

Could you please let me know if these results are reasonable, or if I am
doing something wrong. I really appreciate the help.

Thanks,
Steve



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-unbalanced-computation-and-slow-runtime-on-livejournal-network-tp22565.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to