[ https://issues.apache.org/jira/browse/SPARK-21861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150389#comment-16150389 ]
Nikhil Bhide edited comment on SPARK-21861 at 9/1/17 11:53 AM: --------------------------------------------------------------- Hi Sean, Please find additional contents as follows. I have added few comments in the description section (highlighted), and I have slightly modified the example (highlighted). Just to summarize : 1. Added details about damping factor & reset probability 2. Added details of Personalized Page Rank Algo supported in Graphx 3. Modified example - Sorted results in descending order by weights (ranks) - Added example of PRR PageRank measures the importance of each vertex in a graph, assuming an edge from u to v represents an endorsement of v’s importance by u. For example, if a Twitter user is followed by many others, the user will be ranked highly.{color:red} *PageRank works by computing number and quality of links to a node to estimate the importance of a node. *{color} GraphX comes with static and dynamic implementations of PageRank as methods on the PageRank object. Static PageRank runs for a fixed number of iterations, while dynamic PageRank runs until the ranks converge (i.e., stop changing by more than a specified tolerance). {color:red}Dynamic version of page rank PageRank$pageRank takes in two parameters tolerance factor and reset probability, whereas static version of page rank PageRank$staticPageRank takes in 2 parameters, number of iterations and reset probability. Reset probability is associated with damping factor, which is click through probability. Page rank is based on random surfer model, and damping factor is factor by which surfer would continue visiting different links. Damping factor ranges between 0 and 1. By default, damping factor value is set to 0.85 and random probability is calculated as 1 – damping factor.{color} {color:red}GraphX also supports Personalized PageRank (PRR), which is more general version of page rank. PRR is widely used in recommendation systems. For example, Twitter uses PRR to present users with other accounts that they may wish to follow. GraphX provides static and dynamic implementations of Personalized PageRank methods on PageRank object.{color} GraphOpsallows calling these algorithms directly as methods on Graph. GraphX also includes an example social network dataset that we can run PageRank on. A set of users is given in data/graphx/users.txt, and a set of relationships between users is given in data/graphx/followers.txt. We compute the PageRank of each user as follows: Code changes - PageRankExample.scala was (Author: nikbhi15): Hi Sean, Please find additional contents as follows. I have added few comments in the description section (highlighted), and I have slightly modified the example (highlighted). Just to summarize : 1. Added details about damping factor & reset probability 2. Added details of Personalized Page Rank Algo supported in Graphx 3. Modified example - Sorted results in descending order by weights (ranks) - Added example of PRR PageRank measures the importance of each vertex in a graph, assuming an edge from u to v represents an endorsement of v’s importance by u. For example, if a Twitter user is followed by many others, the user will be ranked highly.{color:red} *PageRank works by computing number and quality of links to a node to estimate the importance of a node. *{color} GraphX comes with static and dynamic implementations of PageRank as methods on the PageRank object. Static PageRank runs for a fixed number of iterations, while dynamic PageRank runs until the ranks converge (i.e., stop changing by more than a specified tolerance). {color:red}Dynamic version of page rank PageRank$pageRank takes in two parameters tolerance factor and reset probability, whereas static version of page rank PageRank$staticPageRank takes in 2 parameters, number of iterations and reset probability. Reset probability is associated with damping factor, which is click through probability. Page rank is based on random surfer model, and damping factor is factor by which surfer would continue visiting different links. Damping factor ranges between 0 and 1. By default, damping factor value is set to 0.85 and random probability is calculated as 1 – damping factor.{color} {color:red}GraphX also supports Personalized PageRank (PRR), which is more general version of page rank. PRR is widely used in recommendation systems. For example, Twitter uses PRR to present users with other accounts that they may wish to follow. GraphX provides static and dynamic implementations of Personalized PageRank methods on PageRank object.{color} GraphOpsallows calling these algorithms directly as methods on Graph. GraphX also includes an example social network dataset that we can run PageRank on. A set of users is given in data/graphx/users.txt, and a set of relationships between users is given in data/graphx/followers.txt. We compute the PageRank of each user as follows: import org.apache.spark.graphx.GraphLoader // Load the edges as a graph val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt") // Run PageRank val ranks = graph.pageRank(0.0001).vertices // Join the ranks with the usernames val users = sc.textFile("data/graphx/users.txt").map { line => val fields = line.split(",") (fields(0).toLong, fields(1)) } val ranksByUsername = users.join(ranks).map { case (id, (username, rank)) => (username, rank) } // Print the result {color:red} println(ranksByUsername.sortBy({ case (username, rank) => rank }, false).collect().mkString("\n")) //Run Personalized PageRank Algorithm on first vertex as a source vertex val ranksPRR = graph.personalizedPageRank(graph.vertices.first._1, 0.0001).vertices val ranksPRRByUsername = users.join(ranksPRR).map { case (id, (username, rank)) => (username, rank) } // Print the result println(ranksPRRByUsername.sortBy({ case (username, rank) => rank }, false).collect().mkString("\n")){color} > Add more details to PageRank illustration > ----------------------------------------- > > Key: SPARK-21861 > URL: https://issues.apache.org/jira/browse/SPARK-21861 > Project: Spark > Issue Type: Documentation > Components: Documentation > Affects Versions: 2.2.0 > Reporter: Nikhil Bhide > Priority: Trivial > Labels: documentation > > Add more details to PageRank illustration on > [https://spark.apache.org/docs/latest/graphx-programming-guide.html#pagerank] > Adding details of page rank algorithm parameters such as dumping factor would > be pretty much effective. Also, adding more action on result such as sorting > based on weight would be more helpful. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org