[jira] [Commented] (TINKERPOP-1783) PageRank gives incorrect results for graphs with sinks

Marko A. Rodriguez (JIRA) Mon, 18 Sep 2017 08:39:16 -0700

    [ 
https://issues.apache.org/jira/browse/TINKERPOP-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16170158#comment-16170158
 ]


Marko A. Rodriguez commented on TINKERPOP-1783:
-----------------------------------------------

So, there were two problems with the PageRank implementation.

 1. The teleportation energy was not being distributed correctly. We now have a 
{{Memory}} variable called {{TELEPORATION_ENERGY}}.
 2. Vertices without outgoing edges were not sending their stored energy to 
teleportation. Thus, total energy was being lost over time.

We have slight variations between iGraph and TinkerPop, but we get standard 
normalizations of 1.0.

{code}
VERTEX  iGRAPH    TINKERPOP
marko   0.1119788 0.11375485828040575
vadas   0.1370267 0.14598540145985406
lop     0.2665600 0.30472082661863686
josh    0.1620746 0.14598540145985406
ripple  0.2103812 0.1757986539008437
peter   0.1119788 0.11375485828040575
{code}

You can see the normalization is maintained:

{code}
gremlin> 0.11375485828040575 + 0.14598540145985406 + 0.30472082661863686 + 
0.14598540145985406 + 0.1757986539008437 + 0.11375485828040575
==>1.00000000000000018
{code}


> PageRank gives incorrect results for graphs with sinks
> ------------------------------------------------------
>
>                 Key: TINKERPOP-1783
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1783
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: process
>    Affects Versions: 3.3.0, 3.1.8, 3.2.6
>            Reporter: Artem Aliev
>            Assignee: Marko A. Rodriguez
>
> {quote} Sink vertices (those with no outgoing edges) should evenly distribute 
> their rank to the entire graph but in the current implementation it is just 
> lost.
> {quote} 
> Wiki: https://en.wikipedia.org/wiki/PageRank#Simplified_algorithm
> {quote}  In the original form of PageRank, the sum of PageRank over all pages 
> was the total number of pages on the web at that time
> {quote} 
> I found the issue, while comparing results with the spark graphX.
> So this is a copy of  https://issues.apache.org/jira/browse/SPARK-18847
> How to reproduce:
> {code}
> gremlin> graph = TinkerFactory.createModern()
> gremlin> g = graph.traversal().withComputer()
> gremlin> 
> g.V().pageRank(0.85).times(40).by('pageRank').values('pageRank').sum()
> ==>1.318625
> gremlin> g.V().pageRank(0.85).times(1).by('pageRank').values('pageRank').sum()
> ==>3.4499999999999997
> #inital values:
> gremlin> g.V().pageRank(0.85).times(0).by('pageRank').values('pageRank').sum()
> ==>6.0
> {code}
> They fixed the issue by normalising values after each step.
> The other way to fix is to send the message to it self (stay on the same 
> page).
> To workaround the problem just add self pointing edges:
> {code}
> gremlin>g.V().as('B').addE('knows').from('B')
> {code}
> Then you'll get always correct sum. But I'm not sure it is a proper 
> assumption. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (TINKERPOP-1783) PageRank gives incorrect results for graphs with sinks

Reply via email to