[ https://issues.apache.org/jira/browse/SPARK-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Peter Fontana updated SPARK-3206: --------------------------------- Description: I have found a small example where the PageRank values using run and runUntilConvergence differ quite a bit. I am running the Pagerank module on the following graph: Edge Table: | Node1 | Node2 | |1 | 2 | |1 | 3| |3 | 2| |3 | 4| |5 | 3| |6 | 7| |7 | 8| |8 | 9| |9 | 7| Node Table (note the extra node): | NodeID | NodeName | |a | 1| |b | 2| |c | 3| |d | 4| |e | 5| |f | 6| |g | 7| |h | 8| |i | 9| |j.longaddress.com | 10| with a default resetProb of 0.15. When I compute the pageRank with runUntilConvergence, running ``` val ranks = PageRank.runUntilConvergence(graph,0.0001).vertices ``` I get the ranks (4,0.29503124999999997) (1,0.15) (6,0.15) (3,0.34124999999999994) (7,1.3299054047985106) (9,1.2381240056453071) (8,1.2803346052504254) (10,0.15) (5,0.15) (2,0.35878124999999994) However, when I run page Rank with the run() method, running val ranksI = PageRank.run(graph,100).vertices I get the page ranks (4,0.29503124999999997) (1,0.15) (6,0.15) (3,0.34124999999999994) (7,0.9999999387662847) (9,0.9999999256447741) (8,0.9999999256447741) (10,0.15) (5,0.15) (2,0.29503124999999997) These are quite different, leading me to suspect that one of the PageRank methods is incorrect. I have examined the source, but I do not know what the correct fix is, or which set of values is correct. was: I have found a small example where the PageRank values using run and runUntilConvergence differ quite a bit. I am running the Pagerank module on the following graph: Edge Table: | Node1 | Node2 | |1 | 2 | |1 | 3| 3 | 2 3 | 4 5 | 3 6 | 7 7 | 8 8 | 9 9 | 7 Node Table (note the extra node): | NodeID | NodeName | | ------------- | ------------- | a | 1 b | 2 c | 3 d | 4 e | 5 f | 6 g | 7 h | 8 i | 9 j.longaddress.com | 10 with a default resetProb of 0.15. When I compute the pageRank with runUntilConvergence, running val ranks = PageRank.runUntilConvergence(graph,0.0001).vertices I get the ranks (4,0.29503124999999997) (1,0.15) (6,0.15) (3,0.34124999999999994) (7,1.3299054047985106) (9,1.2381240056453071) (8,1.2803346052504254) (10,0.15) (5,0.15) (2,0.35878124999999994) However, when I run page Rank with the run() method, running val ranksI = PageRank.run(graph,100).vertices I get the page ranks (4,0.29503124999999997) (1,0.15) (6,0.15) (3,0.34124999999999994) (7,0.9999999387662847) (9,0.9999999256447741) (8,0.9999999256447741) (10,0.15) (5,0.15) (2,0.29503124999999997) These are quite different, leading me to suspect that one of the PageRank methods is incorrect. I have examined the source, but I do not know what the correct fix is, or which set of values is correct. > Error in PageRank values > ------------------------ > > Key: SPARK-3206 > URL: https://issues.apache.org/jira/browse/SPARK-3206 > Project: Spark > Issue Type: Bug > Components: GraphX > Affects Versions: 1.0.2 > Environment: UNIX with Hadoop > Reporter: Peter Fontana > > I have found a small example where the PageRank values using run and > runUntilConvergence differ quite a bit. > I am running the Pagerank module on the following graph: > Edge Table: > | Node1 | Node2 | > |1 | 2 | > |1 | 3| > |3 | 2| > |3 | 4| > |5 | 3| > |6 | 7| > |7 | 8| > |8 | 9| > |9 | 7| > Node Table (note the extra node): > | NodeID | NodeName | > |a | 1| > |b | 2| > |c | 3| > |d | 4| > |e | 5| > |f | 6| > |g | 7| > |h | 8| > |i | 9| > |j.longaddress.com | 10| > with a default resetProb of 0.15. > When I compute the pageRank with runUntilConvergence, running > ``` > val ranks = PageRank.runUntilConvergence(graph,0.0001).vertices > ``` > I get the ranks > (4,0.29503124999999997) > (1,0.15) > (6,0.15) > (3,0.34124999999999994) > (7,1.3299054047985106) > (9,1.2381240056453071) > (8,1.2803346052504254) > (10,0.15) > (5,0.15) > (2,0.35878124999999994) > However, when I run page Rank with the run() method, running val ranksI = > PageRank.run(graph,100).vertices I get the page ranks > (4,0.29503124999999997) > (1,0.15) > (6,0.15) > (3,0.34124999999999994) > (7,0.9999999387662847) > (9,0.9999999256447741) > (8,0.9999999256447741) > (10,0.15) > (5,0.15) > (2,0.29503124999999997) > These are quite different, leading me to suspect that one of the PageRank > methods is incorrect. I have examined the source, but I do not know what the > correct fix is, or which set of values is correct. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org