[ https://issues.apache.org/jira/browse/SPARK-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482063#comment-14482063 ]
Reynold Xin commented on SPARK-6710: ------------------------------------ [~michaelmalak] would you like to submit a pull request for this? > Wrong initial bias in GraphX SVDPlusPlus > ---------------------------------------- > > Key: SPARK-6710 > URL: https://issues.apache.org/jira/browse/SPARK-6710 > Project: Spark > Issue Type: Bug > Components: GraphX > Affects Versions: 1.3.0 > Reporter: Michael Malak > Labels: easyfix > Original Estimate: 2h > Remaining Estimate: 2h > > In the initialization portion of GraphX SVDPlusPluS, the initialization of > biases appears to be incorrect. Specifically, in line > https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/lib/SVDPlusPlus.scala#L96 > > instead of > (vd._1, vd._2, msg.get._2 / msg.get._1, 1.0 / scala.math.sqrt(msg.get._1)) > it should probably be > (vd._1, vd._2, msg.get._2 / msg.get._1 - u, 1.0 / > scala.math.sqrt(msg.get._1)) > That is, the biases bu and bi (both represented as the third component of the > Tuple4[] above, depending on whether the vertex is a user or an item), > described in equation (1) of the Koren paper, are supposed to be small > offsets to the mean (represented by the variable u, signifying the Greek > letter mu) to account for peculiarities of individual users and items. > Initializing these biases to wrong values should theoretically not matter > given enough iterations of the algorithm, but some quick empirical testing > shows it has trouble converging at all, even after many orders of magnitude > additional iterations. > This perhaps could be the source of previously reported trouble with > SVDPlusPlus. > http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-SVDPlusPlus-problem-td12885.html > -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org