Artem Aliev created TINKERPOP-1783:
--------------------------------------

             Summary: PageRank gives incorrect results for graphs with sinks
                 Key: TINKERPOP-1783
                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1783
             Project: TinkerPop
          Issue Type: Bug
    Affects Versions: 3.2.6, 3.1.8, 3.3.0
            Reporter: Artem Aliev


{quote} Sink vertices (those with no outgoing edges) should evenly distribute 
their rank to the entire graph but in the current implementation it is just 
lost.
{quote} 

Wiki: https://en.wikipedia.org/wiki/PageRank#Simplified_algorithm
{quote}  In the original form of PageRank, the sum of PageRank over all pages 
was the total number of pages on the web at that time
{quote} 

I found the issue, while comparing results with the spark graphX.
So this is a copy of  https://issues.apache.org/jira/browse/SPARK-18847

How to reproduce:
{code}
gremlin> graph = TinkerFactory.createModern()
gremlin> g = graph.traversal().withComputer()
gremlin> g.V().pageRank(0.85).times(40).by('pageRank').values('pageRank').sum()
==>1.318625
gremlin> g.V().pageRank(0.85).times(1).by('pageRank').values('pageRank').sum()
==>3.4499999999999997
#inital values:
gremlin> g.V().pageRank(0.85).times(0).by('pageRank').values('pageRank').sum()
==>6.0
{code}

They fixed the issue by normalising values after each step.
The other way to fix is to send the message to it self (stay on the same page).
To workaround the problem just add self pointing edges:
{code}
gremlin>g.V().as('B').addE('knows').from('B')
{code}
Then you'll get always correct sum. But I'm not sure it is a proper assumption. 







--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to