Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0

2015-05-19 Thread Xiangrui Meng
In implicit feedback model, the coefficients were already penalized
(towards zero) by the number of unobserved ratings. So I think it is
fair to keep the 1.3.0 weighting (by the number of total users/items).
Again, I don't think we have a clear answer. It would be nice to run
some experiments and see which works better. -Xiangrui

On Thu, May 7, 2015 at 9:35 AM, Ravi Mody rmody...@gmail.com wrote:
 After thinking about it more, I do think weighting lambda by sum_i cij is
 the equivalent of the ALS-WR paper's approach for the implicit case. This
 provides scale-invariance for varying products/users and for varying
 ratings, and should behave well for all alphas. What do you guys think?

 On Wed, May 6, 2015 at 12:29 PM, Ravi Mody rmody...@gmail.com wrote:

 Whoops I just saw this thread, it got caught in my spam filter. Thanks for
 looking into this Xiangrui and Sean.

 The implicit situation does seem fairly complicated to me. The cost
 function (not including the regularization term) is affected both by the
 number of ratings and by the number of user/products. As we increase alpha
 the contribution to the cost function from the number of users/products
 diminishes compared to the contribution from the number of ratings. So large
 alphas seem to favor the weighted-lambda approach, even though it's not a
 perfect match. Smaller alphas favor Xiangrui's 1.3.0 approach, but again
 it's not a perfect match.

 I believe low alphas won't work well with regularization because both
 terms in the cost function will just push everything to zero. Some of my
 experiments confirm this. This leads me to think that weighted-lambda would
 work better in practice, but I have no evidence of this. It may make sense
 to weight lambda by sum_i cij instead?





 On Wed, Apr 1, 2015 at 7:59 PM, Xiangrui Meng men...@gmail.com wrote:

 Ravi, we just merged https://issues.apache.org/jira/browse/SPARK-6642
 and used the same lambda scaling as in 1.2. The change will be
 included in Spark 1.3.1, which will be released soon. Thanks for
 reporting this issue! -Xiangrui

 On Tue, Mar 31, 2015 at 8:53 PM, Xiangrui Meng men...@gmail.com wrote:
  I created a JIRA for this:
  https://issues.apache.org/jira/browse/SPARK-6637. Since we don't have
  a clear answer about how the scaling should be handled. Maybe the best
  solution for now is to switch back to the 1.2 scaling. -Xiangrui
 
  On Tue, Mar 31, 2015 at 2:50 PM, Sean Owen so...@cloudera.com wrote:
  Ah yeah I take your point. The squared error term is over the whole
  user-item matrix, technically, in the implicit case. I suppose I am
  used to assuming that the 0 terms in this matrix are weighted so much
  less (because alpha is usually large-ish) that they're almost not
  there, but they are. So I had just used the explicit formulation.
 
  I suppose the result is kind of scale invariant, but not exactly. I
  had not prioritized this property since I had generally built models
  on the full data set and not a sample, and had assumed that lambda
  would need to be retuned over time as the input grew anyway.
 
  So, basically I don't know anything more than you do, sorry!
 
  On Tue, Mar 31, 2015 at 10:41 PM, Xiangrui Meng men...@gmail.com
  wrote:
  Hey Sean,
 
  That is true for explicit model, but not for implicit. The ALS-WR
  paper doesn't cover the implicit model. In implicit formulation, a
  sub-problem (for v_j) is:
 
  min_{v_j} \sum_i c_ij (p_ij - u_i^T v_j)^2 + lambda * X * \|v_j\|_2^2
 
  This is a sum for all i but not just the users who rate item j. In
  this case, if we set X=m_j, the number of observed ratings for item
  j,
  it is not really scale invariant. We have #users user vectors in the
  least squares problem but only penalize lambda * #ratings. I was
  suggesting using lambda * m directly for implicit model to match the
  number of vectors in the least squares problem. Well, this is my
  theory. I don't find any public work about it.
 
  Best,
  Xiangrui
 
  On Tue, Mar 31, 2015 at 5:17 AM, Sean Owen so...@cloudera.com
  wrote:
  I had always understood the formulation to be the first option you
  describe. Lambda is scaled by the number of items the user has rated
  /
  interacted with. I think the goal is to avoid fitting the tastes of
  prolific users disproportionately just because they have many
  ratings
  to fit. This is what's described in the ALS-WR paper we link to on
  the
  Spark web site, in equation 5
 
  (http://www.grappa.univ-lille3.fr/~mary/cours/stats/centrale/reco/paper/MatrixFactorizationALS.pdf)
 
  I think this also gets you the scale-invariance? For every
  additional
  rating from user i to product j, you add one new term to the
  squared-error sum, (r_ij - u_i . m_j)^2, but also, you'd increase
  the
  regularization term by lambda * (|u_i|^2 + |m_j|^2)  They are at
  least
  both increasing about linearly as ratings increase. If the
  regularization term is multiplied by the total number of users and
  products in the model, then 

Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0

2015-05-07 Thread Ravi Mody
After thinking about it more, I do think weighting lambda by sum_i cij is
the equivalent of the ALS-WR paper's approach for the implicit case. This
provides scale-invariance for varying products/users and for varying ratings,
and should behave well for all alphas. What do you guys think?

On Wed, May 6, 2015 at 12:29 PM, Ravi Mody rmody...@gmail.com wrote:

 Whoops I just saw this thread, it got caught in my spam filter. Thanks for
 looking into this Xiangrui and Sean.

 The implicit situation does seem fairly complicated to me. The cost
 function (not including the regularization term) is affected both by the
 number of ratings and by the number of user/products. As we increase alpha
 the contribution to the cost function from the number of users/products
 diminishes compared to the contribution from the number of ratings. So
 large alphas seem to favor the weighted-lambda approach, even though it's
 not a perfect match. Smaller alphas favor Xiangrui's 1.3.0 approach, but
 again it's not a perfect match.

 I believe low alphas won't work well with regularization because both
 terms in the cost function will just push everything to zero. Some of my
 experiments confirm this. This leads me to think that weighted-lambda would
 work better in practice, but I have no evidence of this. It may make sense
 to weight lambda by sum_i cij instead?





 On Wed, Apr 1, 2015 at 7:59 PM, Xiangrui Meng men...@gmail.com wrote:

 Ravi, we just merged https://issues.apache.org/jira/browse/SPARK-6642
 and used the same lambda scaling as in 1.2. The change will be
 included in Spark 1.3.1, which will be released soon. Thanks for
 reporting this issue! -Xiangrui

 On Tue, Mar 31, 2015 at 8:53 PM, Xiangrui Meng men...@gmail.com wrote:
  I created a JIRA for this:
  https://issues.apache.org/jira/browse/SPARK-6637. Since we don't have
  a clear answer about how the scaling should be handled. Maybe the best
  solution for now is to switch back to the 1.2 scaling. -Xiangrui
 
  On Tue, Mar 31, 2015 at 2:50 PM, Sean Owen so...@cloudera.com wrote:
  Ah yeah I take your point. The squared error term is over the whole
  user-item matrix, technically, in the implicit case. I suppose I am
  used to assuming that the 0 terms in this matrix are weighted so much
  less (because alpha is usually large-ish) that they're almost not
  there, but they are. So I had just used the explicit formulation.
 
  I suppose the result is kind of scale invariant, but not exactly. I
  had not prioritized this property since I had generally built models
  on the full data set and not a sample, and had assumed that lambda
  would need to be retuned over time as the input grew anyway.
 
  So, basically I don't know anything more than you do, sorry!
 
  On Tue, Mar 31, 2015 at 10:41 PM, Xiangrui Meng men...@gmail.com
 wrote:
  Hey Sean,
 
  That is true for explicit model, but not for implicit. The ALS-WR
  paper doesn't cover the implicit model. In implicit formulation, a
  sub-problem (for v_j) is:
 
  min_{v_j} \sum_i c_ij (p_ij - u_i^T v_j)^2 + lambda * X * \|v_j\|_2^2
 
  This is a sum for all i but not just the users who rate item j. In
  this case, if we set X=m_j, the number of observed ratings for item j,
  it is not really scale invariant. We have #users user vectors in the
  least squares problem but only penalize lambda * #ratings. I was
  suggesting using lambda * m directly for implicit model to match the
  number of vectors in the least squares problem. Well, this is my
  theory. I don't find any public work about it.
 
  Best,
  Xiangrui
 
  On Tue, Mar 31, 2015 at 5:17 AM, Sean Owen so...@cloudera.com
 wrote:
  I had always understood the formulation to be the first option you
  describe. Lambda is scaled by the number of items the user has rated
 /
  interacted with. I think the goal is to avoid fitting the tastes of
  prolific users disproportionately just because they have many ratings
  to fit. This is what's described in the ALS-WR paper we link to on
 the
  Spark web site, in equation 5
  (
 http://www.grappa.univ-lille3.fr/~mary/cours/stats/centrale/reco/paper/MatrixFactorizationALS.pdf
 )
 
  I think this also gets you the scale-invariance? For every additional
  rating from user i to product j, you add one new term to the
  squared-error sum, (r_ij - u_i . m_j)^2, but also, you'd increase the
  regularization term by lambda * (|u_i|^2 + |m_j|^2)  They are at
 least
  both increasing about linearly as ratings increase. If the
  regularization term is multiplied by the total number of users and
  products in the model, then it's fixed.
 
  I might misunderstand you and/or be speaking about something slightly
  different when it comes to invariance. But FWIW I had always
  understood the regularization to be multiplied by the number of
  explicit ratings.
 
  On Mon, Mar 30, 2015 at 5:51 PM, Xiangrui Meng men...@gmail.com
 wrote:
  Okay, I didn't realize that I changed the behavior of lambda in 1.3.
  to make it scale-invariant, but it 

Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0

2015-05-06 Thread Ravi Mody
Whoops I just saw this thread, it got caught in my spam filter. Thanks for
looking into this Xiangrui and Sean.

The implicit situation does seem fairly complicated to me. The cost
function (not including the regularization term) is affected both by the
number of ratings and by the number of user/products. As we increase alpha
the contribution to the cost function from the number of users/products
diminishes compared to the contribution from the number of ratings. So
large alphas seem to favor the weighted-lambda approach, even though it's
not a perfect match. Smaller alphas favor Xiangrui's 1.3.0 approach, but
again it's not a perfect match.

I believe low alphas won't work well with regularization because both terms
in the cost function will just push everything to zero. Some of my
experiments confirm this. This leads me to think that weighted-lambda would
work better in practice, but I have no evidence of this. It may make sense
to weight lambda by sum_i cij instead?





On Wed, Apr 1, 2015 at 7:59 PM, Xiangrui Meng men...@gmail.com wrote:

 Ravi, we just merged https://issues.apache.org/jira/browse/SPARK-6642
 and used the same lambda scaling as in 1.2. The change will be
 included in Spark 1.3.1, which will be released soon. Thanks for
 reporting this issue! -Xiangrui

 On Tue, Mar 31, 2015 at 8:53 PM, Xiangrui Meng men...@gmail.com wrote:
  I created a JIRA for this:
  https://issues.apache.org/jira/browse/SPARK-6637. Since we don't have
  a clear answer about how the scaling should be handled. Maybe the best
  solution for now is to switch back to the 1.2 scaling. -Xiangrui
 
  On Tue, Mar 31, 2015 at 2:50 PM, Sean Owen so...@cloudera.com wrote:
  Ah yeah I take your point. The squared error term is over the whole
  user-item matrix, technically, in the implicit case. I suppose I am
  used to assuming that the 0 terms in this matrix are weighted so much
  less (because alpha is usually large-ish) that they're almost not
  there, but they are. So I had just used the explicit formulation.
 
  I suppose the result is kind of scale invariant, but not exactly. I
  had not prioritized this property since I had generally built models
  on the full data set and not a sample, and had assumed that lambda
  would need to be retuned over time as the input grew anyway.
 
  So, basically I don't know anything more than you do, sorry!
 
  On Tue, Mar 31, 2015 at 10:41 PM, Xiangrui Meng men...@gmail.com
 wrote:
  Hey Sean,
 
  That is true for explicit model, but not for implicit. The ALS-WR
  paper doesn't cover the implicit model. In implicit formulation, a
  sub-problem (for v_j) is:
 
  min_{v_j} \sum_i c_ij (p_ij - u_i^T v_j)^2 + lambda * X * \|v_j\|_2^2
 
  This is a sum for all i but not just the users who rate item j. In
  this case, if we set X=m_j, the number of observed ratings for item j,
  it is not really scale invariant. We have #users user vectors in the
  least squares problem but only penalize lambda * #ratings. I was
  suggesting using lambda * m directly for implicit model to match the
  number of vectors in the least squares problem. Well, this is my
  theory. I don't find any public work about it.
 
  Best,
  Xiangrui
 
  On Tue, Mar 31, 2015 at 5:17 AM, Sean Owen so...@cloudera.com wrote:
  I had always understood the formulation to be the first option you
  describe. Lambda is scaled by the number of items the user has rated /
  interacted with. I think the goal is to avoid fitting the tastes of
  prolific users disproportionately just because they have many ratings
  to fit. This is what's described in the ALS-WR paper we link to on the
  Spark web site, in equation 5
  (
 http://www.grappa.univ-lille3.fr/~mary/cours/stats/centrale/reco/paper/MatrixFactorizationALS.pdf
 )
 
  I think this also gets you the scale-invariance? For every additional
  rating from user i to product j, you add one new term to the
  squared-error sum, (r_ij - u_i . m_j)^2, but also, you'd increase the
  regularization term by lambda * (|u_i|^2 + |m_j|^2)  They are at least
  both increasing about linearly as ratings increase. If the
  regularization term is multiplied by the total number of users and
  products in the model, then it's fixed.
 
  I might misunderstand you and/or be speaking about something slightly
  different when it comes to invariance. But FWIW I had always
  understood the regularization to be multiplied by the number of
  explicit ratings.
 
  On Mon, Mar 30, 2015 at 5:51 PM, Xiangrui Meng men...@gmail.com
 wrote:
  Okay, I didn't realize that I changed the behavior of lambda in 1.3.
  to make it scale-invariant, but it is worth discussing whether this
  is a good change. In 1.2, we multiply lambda by the number ratings in
  each sub-problem. This makes it scale-invariant for explicit
  feedback. However, in implicit feedback model, a user's sub-problem
  contains all item factors. Then the question is whether we should
  multiply lambda by the number of explicit ratings from this user or
 by

Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0

2015-04-01 Thread Xiangrui Meng
Ravi, we just merged https://issues.apache.org/jira/browse/SPARK-6642
and used the same lambda scaling as in 1.2. The change will be
included in Spark 1.3.1, which will be released soon. Thanks for
reporting this issue! -Xiangrui

On Tue, Mar 31, 2015 at 8:53 PM, Xiangrui Meng men...@gmail.com wrote:
 I created a JIRA for this:
 https://issues.apache.org/jira/browse/SPARK-6637. Since we don't have
 a clear answer about how the scaling should be handled. Maybe the best
 solution for now is to switch back to the 1.2 scaling. -Xiangrui

 On Tue, Mar 31, 2015 at 2:50 PM, Sean Owen so...@cloudera.com wrote:
 Ah yeah I take your point. The squared error term is over the whole
 user-item matrix, technically, in the implicit case. I suppose I am
 used to assuming that the 0 terms in this matrix are weighted so much
 less (because alpha is usually large-ish) that they're almost not
 there, but they are. So I had just used the explicit formulation.

 I suppose the result is kind of scale invariant, but not exactly. I
 had not prioritized this property since I had generally built models
 on the full data set and not a sample, and had assumed that lambda
 would need to be retuned over time as the input grew anyway.

 So, basically I don't know anything more than you do, sorry!

 On Tue, Mar 31, 2015 at 10:41 PM, Xiangrui Meng men...@gmail.com wrote:
 Hey Sean,

 That is true for explicit model, but not for implicit. The ALS-WR
 paper doesn't cover the implicit model. In implicit formulation, a
 sub-problem (for v_j) is:

 min_{v_j} \sum_i c_ij (p_ij - u_i^T v_j)^2 + lambda * X * \|v_j\|_2^2

 This is a sum for all i but not just the users who rate item j. In
 this case, if we set X=m_j, the number of observed ratings for item j,
 it is not really scale invariant. We have #users user vectors in the
 least squares problem but only penalize lambda * #ratings. I was
 suggesting using lambda * m directly for implicit model to match the
 number of vectors in the least squares problem. Well, this is my
 theory. I don't find any public work about it.

 Best,
 Xiangrui

 On Tue, Mar 31, 2015 at 5:17 AM, Sean Owen so...@cloudera.com wrote:
 I had always understood the formulation to be the first option you
 describe. Lambda is scaled by the number of items the user has rated /
 interacted with. I think the goal is to avoid fitting the tastes of
 prolific users disproportionately just because they have many ratings
 to fit. This is what's described in the ALS-WR paper we link to on the
 Spark web site, in equation 5
 (http://www.grappa.univ-lille3.fr/~mary/cours/stats/centrale/reco/paper/MatrixFactorizationALS.pdf)

 I think this also gets you the scale-invariance? For every additional
 rating from user i to product j, you add one new term to the
 squared-error sum, (r_ij - u_i . m_j)^2, but also, you'd increase the
 regularization term by lambda * (|u_i|^2 + |m_j|^2)  They are at least
 both increasing about linearly as ratings increase. If the
 regularization term is multiplied by the total number of users and
 products in the model, then it's fixed.

 I might misunderstand you and/or be speaking about something slightly
 different when it comes to invariance. But FWIW I had always
 understood the regularization to be multiplied by the number of
 explicit ratings.

 On Mon, Mar 30, 2015 at 5:51 PM, Xiangrui Meng men...@gmail.com wrote:
 Okay, I didn't realize that I changed the behavior of lambda in 1.3.
 to make it scale-invariant, but it is worth discussing whether this
 is a good change. In 1.2, we multiply lambda by the number ratings in
 each sub-problem. This makes it scale-invariant for explicit
 feedback. However, in implicit feedback model, a user's sub-problem
 contains all item factors. Then the question is whether we should
 multiply lambda by the number of explicit ratings from this user or by
 the total number of items. We used the former in 1.2 but changed to
 the latter in 1.3. So you should try a smaller lambda to get a similar
 result in 1.3.

 Sean and Shuo, which approach do you prefer? Do you know any existing
 work discussing this?

 Best,
 Xiangrui


 On Fri, Mar 27, 2015 at 11:27 AM, Xiangrui Meng men...@gmail.com wrote:
 This sounds like a bug ... Did you try a different lambda? It would be
 great if you can share your dataset or re-produce this issue on the
 public dataset. Thanks! -Xiangrui

 On Thu, Mar 26, 2015 at 7:56 AM, Ravi Mody rmody...@gmail.com wrote:
 After upgrading to 1.3.0, ALS.trainImplicit() has been returning vastly
 smaller factors (and hence scores). For example, the first few product's
 factor values in 1.2.0 are (0.04821, -0.00674,  -0.0325). In 1.3.0, the
 first few factor values are (2.535456E-8, 1.690301E-8, 6.99245E-8). This
 difference of several orders of magnitude is consistent throughout both 
 user
 and product. The recommendations from 1.2.0 are subjectively much better
 than in 1.3.0. 1.3.0 trains significantly faster than 1.2.0, and uses 
 less
 memory.

 My first 

Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0

2015-03-31 Thread Sean Owen
I had always understood the formulation to be the first option you
describe. Lambda is scaled by the number of items the user has rated /
interacted with. I think the goal is to avoid fitting the tastes of
prolific users disproportionately just because they have many ratings
to fit. This is what's described in the ALS-WR paper we link to on the
Spark web site, in equation 5
(http://www.grappa.univ-lille3.fr/~mary/cours/stats/centrale/reco/paper/MatrixFactorizationALS.pdf)

I think this also gets you the scale-invariance? For every additional
rating from user i to product j, you add one new term to the
squared-error sum, (r_ij - u_i . m_j)^2, but also, you'd increase the
regularization term by lambda * (|u_i|^2 + |m_j|^2)  They are at least
both increasing about linearly as ratings increase. If the
regularization term is multiplied by the total number of users and
products in the model, then it's fixed.

I might misunderstand you and/or be speaking about something slightly
different when it comes to invariance. But FWIW I had always
understood the regularization to be multiplied by the number of
explicit ratings.

On Mon, Mar 30, 2015 at 5:51 PM, Xiangrui Meng men...@gmail.com wrote:
 Okay, I didn't realize that I changed the behavior of lambda in 1.3.
 to make it scale-invariant, but it is worth discussing whether this
 is a good change. In 1.2, we multiply lambda by the number ratings in
 each sub-problem. This makes it scale-invariant for explicit
 feedback. However, in implicit feedback model, a user's sub-problem
 contains all item factors. Then the question is whether we should
 multiply lambda by the number of explicit ratings from this user or by
 the total number of items. We used the former in 1.2 but changed to
 the latter in 1.3. So you should try a smaller lambda to get a similar
 result in 1.3.

 Sean and Shuo, which approach do you prefer? Do you know any existing
 work discussing this?

 Best,
 Xiangrui


 On Fri, Mar 27, 2015 at 11:27 AM, Xiangrui Meng men...@gmail.com wrote:
 This sounds like a bug ... Did you try a different lambda? It would be
 great if you can share your dataset or re-produce this issue on the
 public dataset. Thanks! -Xiangrui

 On Thu, Mar 26, 2015 at 7:56 AM, Ravi Mody rmody...@gmail.com wrote:
 After upgrading to 1.3.0, ALS.trainImplicit() has been returning vastly
 smaller factors (and hence scores). For example, the first few product's
 factor values in 1.2.0 are (0.04821, -0.00674,  -0.0325). In 1.3.0, the
 first few factor values are (2.535456E-8, 1.690301E-8, 6.99245E-8). This
 difference of several orders of magnitude is consistent throughout both user
 and product. The recommendations from 1.2.0 are subjectively much better
 than in 1.3.0. 1.3.0 trains significantly faster than 1.2.0, and uses less
 memory.

 My first thought is that there is too much regularization in the 1.3.0
 results, but I'm using the same lambda parameter value. This is a snippet of
 my scala code:
 .
 val rank = 75
 val numIterations = 15
 val alpha = 10
 val lambda = 0.01
 val model = ALS.trainImplicit(train_data, rank, numIterations,
 lambda=lambda, alpha=alpha)
 .

 The code and input data are identical across both versions. Did anything
 change between the two versions I'm not aware of? I'd appreciate any help!


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0

2015-03-31 Thread Xiangrui Meng
I created a JIRA for this:
https://issues.apache.org/jira/browse/SPARK-6637. Since we don't have
a clear answer about how the scaling should be handled. Maybe the best
solution for now is to switch back to the 1.2 scaling. -Xiangrui

On Tue, Mar 31, 2015 at 2:50 PM, Sean Owen so...@cloudera.com wrote:
 Ah yeah I take your point. The squared error term is over the whole
 user-item matrix, technically, in the implicit case. I suppose I am
 used to assuming that the 0 terms in this matrix are weighted so much
 less (because alpha is usually large-ish) that they're almost not
 there, but they are. So I had just used the explicit formulation.

 I suppose the result is kind of scale invariant, but not exactly. I
 had not prioritized this property since I had generally built models
 on the full data set and not a sample, and had assumed that lambda
 would need to be retuned over time as the input grew anyway.

 So, basically I don't know anything more than you do, sorry!

 On Tue, Mar 31, 2015 at 10:41 PM, Xiangrui Meng men...@gmail.com wrote:
 Hey Sean,

 That is true for explicit model, but not for implicit. The ALS-WR
 paper doesn't cover the implicit model. In implicit formulation, a
 sub-problem (for v_j) is:

 min_{v_j} \sum_i c_ij (p_ij - u_i^T v_j)^2 + lambda * X * \|v_j\|_2^2

 This is a sum for all i but not just the users who rate item j. In
 this case, if we set X=m_j, the number of observed ratings for item j,
 it is not really scale invariant. We have #users user vectors in the
 least squares problem but only penalize lambda * #ratings. I was
 suggesting using lambda * m directly for implicit model to match the
 number of vectors in the least squares problem. Well, this is my
 theory. I don't find any public work about it.

 Best,
 Xiangrui

 On Tue, Mar 31, 2015 at 5:17 AM, Sean Owen so...@cloudera.com wrote:
 I had always understood the formulation to be the first option you
 describe. Lambda is scaled by the number of items the user has rated /
 interacted with. I think the goal is to avoid fitting the tastes of
 prolific users disproportionately just because they have many ratings
 to fit. This is what's described in the ALS-WR paper we link to on the
 Spark web site, in equation 5
 (http://www.grappa.univ-lille3.fr/~mary/cours/stats/centrale/reco/paper/MatrixFactorizationALS.pdf)

 I think this also gets you the scale-invariance? For every additional
 rating from user i to product j, you add one new term to the
 squared-error sum, (r_ij - u_i . m_j)^2, but also, you'd increase the
 regularization term by lambda * (|u_i|^2 + |m_j|^2)  They are at least
 both increasing about linearly as ratings increase. If the
 regularization term is multiplied by the total number of users and
 products in the model, then it's fixed.

 I might misunderstand you and/or be speaking about something slightly
 different when it comes to invariance. But FWIW I had always
 understood the regularization to be multiplied by the number of
 explicit ratings.

 On Mon, Mar 30, 2015 at 5:51 PM, Xiangrui Meng men...@gmail.com wrote:
 Okay, I didn't realize that I changed the behavior of lambda in 1.3.
 to make it scale-invariant, but it is worth discussing whether this
 is a good change. In 1.2, we multiply lambda by the number ratings in
 each sub-problem. This makes it scale-invariant for explicit
 feedback. However, in implicit feedback model, a user's sub-problem
 contains all item factors. Then the question is whether we should
 multiply lambda by the number of explicit ratings from this user or by
 the total number of items. We used the former in 1.2 but changed to
 the latter in 1.3. So you should try a smaller lambda to get a similar
 result in 1.3.

 Sean and Shuo, which approach do you prefer? Do you know any existing
 work discussing this?

 Best,
 Xiangrui


 On Fri, Mar 27, 2015 at 11:27 AM, Xiangrui Meng men...@gmail.com wrote:
 This sounds like a bug ... Did you try a different lambda? It would be
 great if you can share your dataset or re-produce this issue on the
 public dataset. Thanks! -Xiangrui

 On Thu, Mar 26, 2015 at 7:56 AM, Ravi Mody rmody...@gmail.com wrote:
 After upgrading to 1.3.0, ALS.trainImplicit() has been returning vastly
 smaller factors (and hence scores). For example, the first few product's
 factor values in 1.2.0 are (0.04821, -0.00674,  -0.0325). In 1.3.0, the
 first few factor values are (2.535456E-8, 1.690301E-8, 6.99245E-8). This
 difference of several orders of magnitude is consistent throughout both 
 user
 and product. The recommendations from 1.2.0 are subjectively much better
 than in 1.3.0. 1.3.0 trains significantly faster than 1.2.0, and uses 
 less
 memory.

 My first thought is that there is too much regularization in the 1.3.0
 results, but I'm using the same lambda parameter value. This is a 
 snippet of
 my scala code:
 .
 val rank = 75
 val numIterations = 15
 val alpha = 10
 val lambda = 0.01
 val model = ALS.trainImplicit(train_data, rank, numIterations,
 

Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0

2015-03-31 Thread Xiangrui Meng
Hey Sean,

That is true for explicit model, but not for implicit. The ALS-WR
paper doesn't cover the implicit model. In implicit formulation, a
sub-problem (for v_j) is:

min_{v_j} \sum_i c_ij (p_ij - u_i^T v_j)^2 + lambda * X * \|v_j\|_2^2

This is a sum for all i but not just the users who rate item j. In
this case, if we set X=m_j, the number of observed ratings for item j,
it is not really scale invariant. We have #users user vectors in the
least squares problem but only penalize lambda * #ratings. I was
suggesting using lambda * m directly for implicit model to match the
number of vectors in the least squares problem. Well, this is my
theory. I don't find any public work about it.

Best,
Xiangrui

On Tue, Mar 31, 2015 at 5:17 AM, Sean Owen so...@cloudera.com wrote:
 I had always understood the formulation to be the first option you
 describe. Lambda is scaled by the number of items the user has rated /
 interacted with. I think the goal is to avoid fitting the tastes of
 prolific users disproportionately just because they have many ratings
 to fit. This is what's described in the ALS-WR paper we link to on the
 Spark web site, in equation 5
 (http://www.grappa.univ-lille3.fr/~mary/cours/stats/centrale/reco/paper/MatrixFactorizationALS.pdf)

 I think this also gets you the scale-invariance? For every additional
 rating from user i to product j, you add one new term to the
 squared-error sum, (r_ij - u_i . m_j)^2, but also, you'd increase the
 regularization term by lambda * (|u_i|^2 + |m_j|^2)  They are at least
 both increasing about linearly as ratings increase. If the
 regularization term is multiplied by the total number of users and
 products in the model, then it's fixed.

 I might misunderstand you and/or be speaking about something slightly
 different when it comes to invariance. But FWIW I had always
 understood the regularization to be multiplied by the number of
 explicit ratings.

 On Mon, Mar 30, 2015 at 5:51 PM, Xiangrui Meng men...@gmail.com wrote:
 Okay, I didn't realize that I changed the behavior of lambda in 1.3.
 to make it scale-invariant, but it is worth discussing whether this
 is a good change. In 1.2, we multiply lambda by the number ratings in
 each sub-problem. This makes it scale-invariant for explicit
 feedback. However, in implicit feedback model, a user's sub-problem
 contains all item factors. Then the question is whether we should
 multiply lambda by the number of explicit ratings from this user or by
 the total number of items. We used the former in 1.2 but changed to
 the latter in 1.3. So you should try a smaller lambda to get a similar
 result in 1.3.

 Sean and Shuo, which approach do you prefer? Do you know any existing
 work discussing this?

 Best,
 Xiangrui


 On Fri, Mar 27, 2015 at 11:27 AM, Xiangrui Meng men...@gmail.com wrote:
 This sounds like a bug ... Did you try a different lambda? It would be
 great if you can share your dataset or re-produce this issue on the
 public dataset. Thanks! -Xiangrui

 On Thu, Mar 26, 2015 at 7:56 AM, Ravi Mody rmody...@gmail.com wrote:
 After upgrading to 1.3.0, ALS.trainImplicit() has been returning vastly
 smaller factors (and hence scores). For example, the first few product's
 factor values in 1.2.0 are (0.04821, -0.00674,  -0.0325). In 1.3.0, the
 first few factor values are (2.535456E-8, 1.690301E-8, 6.99245E-8). This
 difference of several orders of magnitude is consistent throughout both 
 user
 and product. The recommendations from 1.2.0 are subjectively much better
 than in 1.3.0. 1.3.0 trains significantly faster than 1.2.0, and uses less
 memory.

 My first thought is that there is too much regularization in the 1.3.0
 results, but I'm using the same lambda parameter value. This is a snippet 
 of
 my scala code:
 .
 val rank = 75
 val numIterations = 15
 val alpha = 10
 val lambda = 0.01
 val model = ALS.trainImplicit(train_data, rank, numIterations,
 lambda=lambda, alpha=alpha)
 .

 The code and input data are identical across both versions. Did anything
 change between the two versions I'm not aware of? I'd appreciate any help!


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0

2015-03-30 Thread Xiangrui Meng
Okay, I didn't realize that I changed the behavior of lambda in 1.3.
to make it scale-invariant, but it is worth discussing whether this
is a good change. In 1.2, we multiply lambda by the number ratings in
each sub-problem. This makes it scale-invariant for explicit
feedback. However, in implicit feedback model, a user's sub-problem
contains all item factors. Then the question is whether we should
multiply lambda by the number of explicit ratings from this user or by
the total number of items. We used the former in 1.2 but changed to
the latter in 1.3. So you should try a smaller lambda to get a similar
result in 1.3.

Sean and Shuo, which approach do you prefer? Do you know any existing
work discussing this?

Best,
Xiangrui


On Fri, Mar 27, 2015 at 11:27 AM, Xiangrui Meng men...@gmail.com wrote:
 This sounds like a bug ... Did you try a different lambda? It would be
 great if you can share your dataset or re-produce this issue on the
 public dataset. Thanks! -Xiangrui

 On Thu, Mar 26, 2015 at 7:56 AM, Ravi Mody rmody...@gmail.com wrote:
 After upgrading to 1.3.0, ALS.trainImplicit() has been returning vastly
 smaller factors (and hence scores). For example, the first few product's
 factor values in 1.2.0 are (0.04821, -0.00674,  -0.0325). In 1.3.0, the
 first few factor values are (2.535456E-8, 1.690301E-8, 6.99245E-8). This
 difference of several orders of magnitude is consistent throughout both user
 and product. The recommendations from 1.2.0 are subjectively much better
 than in 1.3.0. 1.3.0 trains significantly faster than 1.2.0, and uses less
 memory.

 My first thought is that there is too much regularization in the 1.3.0
 results, but I'm using the same lambda parameter value. This is a snippet of
 my scala code:
 .
 val rank = 75
 val numIterations = 15
 val alpha = 10
 val lambda = 0.01
 val model = ALS.trainImplicit(train_data, rank, numIterations,
 lambda=lambda, alpha=alpha)
 .

 The code and input data are identical across both versions. Did anything
 change between the two versions I'm not aware of? I'd appreciate any help!


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0

2015-03-27 Thread Xiangrui Meng
This sounds like a bug ... Did you try a different lambda? It would be
great if you can share your dataset or re-produce this issue on the
public dataset. Thanks! -Xiangrui

On Thu, Mar 26, 2015 at 7:56 AM, Ravi Mody rmody...@gmail.com wrote:
 After upgrading to 1.3.0, ALS.trainImplicit() has been returning vastly
 smaller factors (and hence scores). For example, the first few product's
 factor values in 1.2.0 are (0.04821, -0.00674,  -0.0325). In 1.3.0, the
 first few factor values are (2.535456E-8, 1.690301E-8, 6.99245E-8). This
 difference of several orders of magnitude is consistent throughout both user
 and product. The recommendations from 1.2.0 are subjectively much better
 than in 1.3.0. 1.3.0 trains significantly faster than 1.2.0, and uses less
 memory.

 My first thought is that there is too much regularization in the 1.3.0
 results, but I'm using the same lambda parameter value. This is a snippet of
 my scala code:
 .
 val rank = 75
 val numIterations = 15
 val alpha = 10
 val lambda = 0.01
 val model = ALS.trainImplicit(train_data, rank, numIterations,
 lambda=lambda, alpha=alpha)
 .

 The code and input data are identical across both versions. Did anything
 change between the two versions I'm not aware of? I'd appreciate any help!


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0

2015-03-26 Thread Ravi Mody
After upgrading to 1.3.0, ALS.trainImplicit() has been returning vastly
smaller factors (and hence scores). For example, the first few product's
factor values in 1.2.0 are (0.04821, -0.00674,  -0.0325). In 1.3.0, the
first few factor values are (2.535456E-8, 1.690301E-8, 6.99245E-8). This
difference of several orders of magnitude is consistent throughout both
user and product. The recommendations from 1.2.0 are subjectively much
better than in 1.3.0. 1.3.0 trains significantly faster than 1.2.0, and
uses less memory.

My first thought is that there is too much regularization in the 1.3.0
results, but I'm using the same lambda parameter value. This is a snippet
of my scala code:
.
val rank = 75
val numIterations = 15
val alpha = 10
val lambda = 0.01
val model = ALS.trainImplicit(train_data, rank, numIterations,
lambda=lambda, alpha=alpha)
.

The code and input data are identical across both versions. Did anything
change between the two versions I'm not aware of? I'd appreciate any help!