Long running time for GraphX pagerank in dataset com-Friendster

2014-04-20 Thread Qi Song
Hello~
I was running some pagerank tests of GraphX in my 8 nodes cluster. I
allocated each worker 32G memory and 8 CPU cores. The LiveJournal dataset
used 370s, which in my mind is reasonable. But when I tried the
com-Friendster data ( http://snap.stanford.edu/data/com-Friendster.html )
with 65608366 nodes and 1806067135 edges, it took more than 70 hours and is
still running. I'm not sure what caused such a strange phenomenon, the
graph's structure or some unrealized properties of GraphX?
Thanks~
 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Long-running-time-for-GraphX-pagerank-in-dataset-com-Friendster-tp4511.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Comparing GraphX and GraphLab

2014-04-15 Thread Qi Song
Hi Debasish,
I found PageRank LiveJournal cost less than 100 seconds for GraphX in your
EC2. But as I use the example (LiveJournalPageRank) you provided in my
mechines with the same LiveJournal dataset, It took more than 10 minutes.
Following are some details:

Environment: 8 machines with each 2*Intel Xeon E5-2650 CPU、256GB memory、6TB
hard disk+480GB SSD, Infiniband, Debian Wheezy OS.
I use this order: /./bin/run-example
org.apache.spark.examples.graphx.LiveJournalPageRank local
hdfs://10.1.1.33:9000/dataset/LiveJournal.txt
/

Should I set more params to get a faster result?
Moreover, I want to know the default allocation of computing resources, as
run-example may not allow me to allocate them by myself.

Regards~
Qi Song



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Comparing-GraphX-and-GraphLab-tp3112p4265.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.