GraphX twitter

2014-11-18 Thread tom85
I'm having problems running the twitter graph on a cluster with 4 nodes, each having over 100GB of RAM and 32 virtual cores per node. I do have a pre-installed spark version (built against hadoop 2.3, because it didn't compile on my system), but I'm loading my graph file from disk without hdfs.

Pagerank implementation

2014-11-15 Thread tom85
Hi, I wonder if the pagerank implementation is correct. More specifically, I look at the following function from PageRank.scala https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala , which is given to Pregel: def vertexProgram(id: