Michael Malak created SPARK-6710:
------------------------------------

             Summary: Wrong initial bias in GraphX SVDPlusPlus
                 Key: SPARK-6710
                 URL: https://issues.apache.org/jira/browse/SPARK-6710
             Project: Spark
          Issue Type: Bug
          Components: GraphX
    Affects Versions: 1.3.0
            Reporter: Michael Malak


In the initialization portion of GraphX SVDPlusPluS, the initialization of 
biases appears to be incorrect. Specifically, in line 
https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/lib/SVDPlusPlus.scala#L96
 
instead of 
(vd._1, vd._2, msg.get._2 / msg.get._1, 1.0 / scala.math.sqrt(msg.get._1)) 
it should probably be 
(vd._1, vd._2, msg.get._2 / msg.get._1 - u, 1.0 / scala.math.sqrt(msg.get._1)) 

That is, the biases bu and bi (both represented as the third component of the 
Tuple4[] above, depending on whether the vertex is a user or an item), 
described in equation (1) of the Koren paper, are supposed to be small offsets 
to the mean (represented by the variable u, signifying the Greek letter mu) to 
account for peculiarities of individual users and items. 

Initializing these biases to wrong values should theoretically not matter given 
enough iterations of the algorithm, but some quick empirical testing shows it 
has trouble converging at all, even after many orders of magnitude additional 
iterations. 

This perhaps could be the source of previously reported trouble with 
SVDPlusPlus. 
http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-SVDPlusPlus-problem-td12885.html
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to