Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0

2015-05-19 Thread Xiangrui Meng
In implicit feedback model, the coefficients were already penalized (towards zero) by the number of unobserved ratings. So I think it is fair to keep the 1.3.0 weighting (by the number of total users/items). Again, I don't think we have a clear answer. It would be nice to run some experiments and

Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0

2015-05-07 Thread Ravi Mody
After thinking about it more, I do think weighting lambda by sum_i cij is the equivalent of the ALS-WR paper's approach for the implicit case. This provides scale-invariance for varying products/users and for varying ratings, and should behave well for all alphas. What do you guys think? On Wed,

Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0

2015-05-06 Thread Ravi Mody
Whoops I just saw this thread, it got caught in my spam filter. Thanks for looking into this Xiangrui and Sean. The implicit situation does seem fairly complicated to me. The cost function (not including the regularization term) is affected both by the number of ratings and by the number of

Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0

2015-04-01 Thread Xiangrui Meng
Ravi, we just merged https://issues.apache.org/jira/browse/SPARK-6642 and used the same lambda scaling as in 1.2. The change will be included in Spark 1.3.1, which will be released soon. Thanks for reporting this issue! -Xiangrui On Tue, Mar 31, 2015 at 8:53 PM, Xiangrui Meng men...@gmail.com

Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0

2015-03-31 Thread Sean Owen
I had always understood the formulation to be the first option you describe. Lambda is scaled by the number of items the user has rated / interacted with. I think the goal is to avoid fitting the tastes of prolific users disproportionately just because they have many ratings to fit. This is what's

Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0

2015-03-31 Thread Xiangrui Meng
I created a JIRA for this: https://issues.apache.org/jira/browse/SPARK-6637. Since we don't have a clear answer about how the scaling should be handled. Maybe the best solution for now is to switch back to the 1.2 scaling. -Xiangrui On Tue, Mar 31, 2015 at 2:50 PM, Sean Owen so...@cloudera.com

Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0

2015-03-31 Thread Xiangrui Meng
Hey Sean, That is true for explicit model, but not for implicit. The ALS-WR paper doesn't cover the implicit model. In implicit formulation, a sub-problem (for v_j) is: min_{v_j} \sum_i c_ij (p_ij - u_i^T v_j)^2 + lambda * X * \|v_j\|_2^2 This is a sum for all i but not just the users who rate

Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0

2015-03-30 Thread Xiangrui Meng
Okay, I didn't realize that I changed the behavior of lambda in 1.3. to make it scale-invariant, but it is worth discussing whether this is a good change. In 1.2, we multiply lambda by the number ratings in each sub-problem. This makes it scale-invariant for explicit feedback. However, in implicit

Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0

2015-03-27 Thread Xiangrui Meng
This sounds like a bug ... Did you try a different lambda? It would be great if you can share your dataset or re-produce this issue on the public dataset. Thanks! -Xiangrui On Thu, Mar 26, 2015 at 7:56 AM, Ravi Mody rmody...@gmail.com wrote: After upgrading to 1.3.0, ALS.trainImplicit() has been

Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0

2015-03-26 Thread Ravi Mody
After upgrading to 1.3.0, ALS.trainImplicit() has been returning vastly smaller factors (and hence scores). For example, the first few product's factor values in 1.2.0 are (0.04821, -0.00674, -0.0325). In 1.3.0, the first few factor values are (2.535456E-8, 1.690301E-8, 6.99245E-8). This