Github user aray commented on the issue:

    https://github.com/apache/spark/pull/16271
  
    **References**
    [Pagerank paper](http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf)
    > We need to make an initial assignment of the ranks. This assignment can 
be made by one of several strategies. If it is going to iterate until 
convergence, in general the initial values will not affect final values, just 
the rate of convergence. But we can speed
    up convergence by choosing a good initial assignment.
    
    Since they are more focused on updating values for one evolving graph (the 
internet) they dont really talk about starting from scratch. But this does 
emphisize that there is no change to answers, just rate of convergence.
    
    A more direct statement would be 
[Wikipedia](https://en.wikipedia.org/wiki/PageRank)
    > PageRank is initialized to the same value for all pages. In the original 
form of PageRank, the sum of PageRank over all pages was the total number of 
pages on the web at that time, so each page in this example would have an 
initial value of 1.
    
    Note that there are two variants of pagerank that differ by a constant 
multiple in outputs but are determined by the dampening factor, we use the 
version that sums to N (most other implementations use the other). More 
Wikipedia:
    >The difference between them is that the PageRank values in the first 
formula sum to one, while in the second formula each PageRank is multiplied by 
N and the sum becomes N.
    
    Essentialy starting with the correct sum is closer to the actual fixed 
point and thus gets you faster convergence.
    
    The [NetworkX 
implementation](https://github.com/networkx/networkx/blob/master/networkx/algorithms/link_analysis/pagerank_alg.py#L122)
 uses the variant that sums to 1 hence their initialization values are all 1/N. 
    
    igraph is unfortunately not comparable as they use a [more complex linear 
solver 
approach](https://github.com/igraph/igraph/blob/master/src/prpack/prpack_solver.cpp)
    
    Additional credentials (if it matters): PhD Mathematics with dissertation 
in Graph Theory


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to