[
https://issues.apache.org/jira/browse/SPARK-11432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
DB Tsai resolved SPARK-11432.
-----------------------------
Resolution: Fixed
Fix Version/s: 1.6.0
Issue resolved by pull request 9386
[https://github.com/apache/spark/pull/9386]
> Personalized PageRank shouldn't use uniform initialization
> ----------------------------------------------------------
>
> Key: SPARK-11432
> URL: https://issues.apache.org/jira/browse/SPARK-11432
> Project: Spark
> Issue Type: Bug
> Components: GraphX
> Affects Versions: 1.5.1
> Reporter: Yves Raimond
> Assignee: Yves Raimond
> Priority: Minor
> Fix For: 1.6.0
>
>
> The current implementation of personalized pagerank in GraphX uses uniform
> initialization over the full graph - every vertex will get initially
> activated.
> For example:
> {code}
> import org.apache.spark._
> import org.apache.spark.graphx._
> import org.apache.spark.rdd.RDD
> val users: RDD[(VertexId, (String, String))] =
> sc.parallelize(Array((3L, ("rxin", "student")), (7L, ("jgonzal",
> "postdoc")),
> (5L, ("franklin", "prof")), (2L, ("istoica", "prof"))))
> val relationships: RDD[Edge[String]] =
> sc.parallelize(Array(Edge(3L, 7L, "collab"), Edge(5L, 3L, "advisor"),
> Edge(2L, 5L, "colleague"), Edge(5L, 7L, "pi")))
> val defaultUser = ("John Doe", "Missing")
> val graph = Graph(users, relationships, defaultUser)
> graph.staticPersonalizedPageRank(3L, 0,
> 0.15).vertices.collect.foreach(println)
> {code}
> Leads to all vertices being set to resetProb (0.15), which is different from
> the behavior described in SPARK-5854, where only the source node should be
> activated.
> The risk is that, after a few iterations, the most activated nodes are the
> source node and the nodes that were untouched by the propagation. For example
> in the above example the vertex 2L will always have an activation of 0.15:
> {code}
> graph.personalizedPageRank(3L, 0, 0.15).vertices.collect.foreach(println)
> {code}
> Which leads into a higher score for 2L than for 7L and 5L, even though
> there's no outbound path from 3L to 2L.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]