I see, thanks. So to implement pagerank with damping factor divided by number of vertices: Is it sufficient to modify initialMessage to *val initialMessage = (resetProb / graph.vertices.count())/ (1.0 - resetProb)* instead of *val initialMessage = resetProb / (1.0 - resetProb)* and yield correct results?
Another question: I load a graph and specify the number of partitions used (should correlate to some multiply of total cores used, i.e. number of machines * number of cores/machine?). This can be seen in the SparkUI after loading the graph. However, when performing pagerank, the amount of RDDs increase significantly over the runtime of the algorithm (with total size even more than the size of input graph). Is this due to the read-only nature of RDDs? In each iteration, are new RDDs created storing intermediate pagerank results? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Pagerank-implementation-tp19013p19196.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org