[jira] [Comment Edited] (SPARK-21861) Add more details to PageRank illustration

Nikhil Bhide (JIRA) Fri, 01 Sep 2017 04:45:48 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-21861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150389#comment-16150389
 ]


Nikhil Bhide edited comment on SPARK-21861 at 9/1/17 11:44 AM:
---------------------------------------------------------------

Hi Sean,
Please find additional contents as follows. I have added few comments in the 
description section (highlighted), and I have slightly modified the example 
(highlighted).
Just to summarize :
1. Added details about damping factor & reset probability
2. Added details of Personalized Page Rank Algo supported in Graphx
3. Modified example 
    - Sorted results in descending order by weights (ranks)
    - Added example of PRR



PageRank measures the importance of each vertex in a graph, assuming an edge 
from u to v represents an endorsement of v’s importance by u. For example, if a 
Twitter user is followed by many others, the user will be ranked 
highly.{color:red} *PageRank works by computing number and quality of links to 
a node to estimate the importance of a node. *{color}
GraphX comes with static and dynamic implementations of PageRank as methods on 
the PageRank object. Static PageRank runs for a fixed number of iterations, 
while dynamic PageRank runs until the ranks converge (i.e., stop changing by 
more than a specified tolerance). {color:red}Dynamic version of page rank 
PageRank$pageRank takes in two parameters tolerance factor and reset 
probability, whereas static version of page rank PageRank$staticPageRank takes 
in 2 parameters, number of iterations and reset probability. Reset probability 
is associated with damping factor, which is click through probability. Page 
rank is based on random surfer model, and damping factor is factor by which 
surfer would continue visiting different links. Damping factor ranges between 0 
and 1. By default, damping factor value is set to 0.85 and random probability 
is calculated as 1 – damping factor.{color}
{color:red}GraphX also supports Personalized PageRank (PRR), which is more 
general version of page rank. PRR is widely used in recommendation systems. For 
example, Twitter uses PRR to present users with other accounts that they may 
wish to follow. GraphX provides static and dynamic implementations of 
Personalized PageRank methods on PageRank object.{color}
GraphOpsallows calling these algorithms directly as methods on Graph.

 import org.apache.spark.graphx.GraphLoader

    // Load the edges as a graph
    val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt")
    // Run PageRank
    val ranks = graph.pageRank(0.0001).vertices
    // Join the ranks with the usernames
    val users = sc.textFile("data/graphx/users.txt").map { line =>
      val fields = line.split(",")
      (fields(0).toLong, fields(1))
    }
    val ranksByUsername = users.join(ranks).map {
      case (id, (username, rank)) => (username, rank)
    }
    // Print the result
*    println(ranksByUsername.sortBy({ case (username, rank) => rank }, 
false).collect().mkString("\n"))
*
*    val ranksPRR = graph.personalizedPageRank(graph.vertices.first._1, 
0.0001).vertices
    val ranksPRRByUsername = users.join(ranksPRR).map {
      case (id, (username, rank)) => (username, rank)
    }
    // Print the result*
*    println(ranksPRRByUsername.sortBy({ case (username, rank) => rank }, 
false).collect().mkString("\n"))
*


was (Author: nikbhi15):
Hi Sean,
Please find additional contents as follows. I have added few comments in the 
description section (highlighted), and I have slightly modified the example 
(highlighted).
Just to summarize :
1. Added details about damping factor & reset probability
2. Added details of Personalized Page Rank Algo supported in Graphx
3. Modified example 
    - Sorted results in descending order by weights (ranks)
    - Added example of PRR



PageRank measures the importance of each vertex in a graph, assuming an edge 
from u to v represents an endorsement of v’s importance by u. For example, if a 
Twitter user is followed by many others, the user will be ranked 
highly.{color:red} *PageRank works by computing number and quality of links to 
a node to estimate the importance of a node. *{color}
GraphX comes with static and dynamic implementations of PageRank as methods on 
the PageRank object. Static PageRank runs for a fixed number of iterations, 
while dynamic PageRank runs until the ranks converge (i.e., stop changing by 
more than a specified tolerance). {color:red}Dynamic version of page rank 
PageRank$pageRank takes in two parameters tolerance factor and reset 
probability, whereas static version of page rank PageRank$staticPageRank takes 
in 2 parameters, number of iterations and reset probability. Reset probability 
is associated with damping factor, which is click through probability. Page 
rank is based on random surfer model, and damping factor is factor by which 
surfer would continue visiting different links. Damping factor ranges between 0 
and 1. By default, damping factor value is set to 0.85 and random probability 
is calculated as 1 – damping factor.{color}
{color:red}GraphX also supports Personalized PageRank (PRR), which is more 
general version of page rank. PRR is widely used in recommendation systems. For 
example, Twitter uses PRR to present users with other accounts that they may 
wish to follow. GraphX provides static and dynamic implementations of 
Personalized PageRank methods on PageRank object. 
GraphOpsallows calling these algorithms directly as methods on Graph. {color}

 import org.apache.spark.graphx.GraphLoader

    // Load the edges as a graph
    val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt")
    // Run PageRank
    val ranks = graph.pageRank(0.0001).vertices
    // Join the ranks with the usernames
    val users = sc.textFile("data/graphx/users.txt").map { line =>
      val fields = line.split(",")
      (fields(0).toLong, fields(1))
    }
    val ranksByUsername = users.join(ranks).map {
      case (id, (username, rank)) => (username, rank)
    }
    // Print the result
*    println(ranksByUsername.sortBy({ case (username, rank) => rank }, 
false).collect().mkString("\n"))
*
*    val ranksPRR = graph.personalizedPageRank(graph.vertices.first._1, 
0.0001).vertices
    val ranksPRRByUsername = users.join(ranksPRR).map {
      case (id, (username, rank)) => (username, rank)
    }
    // Print the result*
*    println(ranksPRRByUsername.sortBy({ case (username, rank) => rank }, 
false).collect().mkString("\n"))
*

> Add more details to PageRank illustration
> -----------------------------------------
>
>                 Key: SPARK-21861
>                 URL: https://issues.apache.org/jira/browse/SPARK-21861
>             Project: Spark
>          Issue Type: Documentation
>          Components: Documentation
>    Affects Versions: 2.2.0
>            Reporter: Nikhil Bhide
>            Priority: Trivial
>              Labels: documentation
>
> Add more details to PageRank illustration on 
> [https://spark.apache.org/docs/latest/graphx-programming-guide.html#pagerank]
> Adding details of page rank algorithm parameters such as dumping factor would 
> be pretty much effective. Also, adding more action on result such as sorting 
> based on weight would be more helpful.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-21861) Add more details to PageRank illustration

Reply via email to