[ 
https://issues.apache.org/jira/browse/SPARK-11432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DB Tsai resolved SPARK-11432.
-----------------------------
       Resolution: Fixed
    Fix Version/s: 1.6.0

Issue resolved by pull request 9386
[https://github.com/apache/spark/pull/9386]

> Personalized PageRank shouldn't use uniform initialization
> ----------------------------------------------------------
>
>                 Key: SPARK-11432
>                 URL: https://issues.apache.org/jira/browse/SPARK-11432
>             Project: Spark
>          Issue Type: Bug
>          Components: GraphX
>    Affects Versions: 1.5.1
>            Reporter: Yves Raimond
>            Assignee: Yves Raimond
>            Priority: Minor
>             Fix For: 1.6.0
>
>
> The current implementation of personalized pagerank in GraphX uses uniform 
> initialization over the full graph - every vertex will get initially 
> activated.
> For example:
> {code}
> import org.apache.spark._
> import org.apache.spark.graphx._
> import org.apache.spark.rdd.RDD
> val users: RDD[(VertexId, (String, String))] =
>   sc.parallelize(Array((3L, ("rxin", "student")), (7L, ("jgonzal", 
> "postdoc")),
>                        (5L, ("franklin", "prof")), (2L, ("istoica", "prof"))))
> val relationships: RDD[Edge[String]] =
>   sc.parallelize(Array(Edge(3L, 7L, "collab"),    Edge(5L, 3L, "advisor"),
>                        Edge(2L, 5L, "colleague"), Edge(5L, 7L, "pi")))
> val defaultUser = ("John Doe", "Missing")
> val graph = Graph(users, relationships, defaultUser)
> graph.staticPersonalizedPageRank(3L, 0, 
> 0.15).vertices.collect.foreach(println)
> {code}
> Leads to all vertices being set to resetProb (0.15), which is different from 
> the behavior described in SPARK-5854, where only the source node should be 
> activated. 
> The risk is that, after a few iterations, the most activated nodes are the 
> source node and the nodes that were untouched by the propagation. For example 
> in the above example the vertex 2L will always have an activation of 0.15:
> {code}
> graph.personalizedPageRank(3L, 0, 0.15).vertices.collect.foreach(println)
> {code}
> Which leads into a higher score for 2L than for 7L and 5L, even though 
> there's no outbound path from 3L to 2L.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to