In implicit feedback model, the coefficients were already penalized
(towards zero) by the number of unobserved ratings. So I think it is
fair to keep the 1.3.0 weighting (by the number of total users/items).
Again, I don't think we have a clear answer. It would be nice to run
some experiments and see which works better. -Xiangrui

On Thu, May 7, 2015 at 9:35 AM, Ravi Mody <rmody...@gmail.com> wrote:
> After thinking about it more, I do think weighting lambda by sum_i cij is
> the equivalent of the ALS-WR paper's approach for the implicit case. This
> provides scale-invariance for varying products/users and for varying
> ratings, and should behave well for all alphas. What do you guys think?
>
> On Wed, May 6, 2015 at 12:29 PM, Ravi Mody <rmody...@gmail.com> wrote:
>>
>> Whoops I just saw this thread, it got caught in my spam filter. Thanks for
>> looking into this Xiangrui and Sean.
>>
>> The implicit situation does seem fairly complicated to me. The cost
>> function (not including the regularization term) is affected both by the
>> number of ratings and by the number of user/products. As we increase alpha
>> the contribution to the cost function from the number of users/products
>> diminishes compared to the contribution from the number of ratings. So large
>> alphas seem to favor the weighted-lambda approach, even though it's not a
>> perfect match. Smaller alphas favor Xiangrui's 1.3.0 approach, but again
>> it's not a perfect match.
>>
>> I believe low alphas won't work well with regularization because both
>> terms in the cost function will just push everything to zero. Some of my
>> experiments confirm this. This leads me to think that weighted-lambda would
>> work better in practice, but I have no evidence of this. It may make sense
>> to weight lambda by sum_i cij instead?
>>
>>
>>
>>
>>
>> On Wed, Apr 1, 2015 at 7:59 PM, Xiangrui Meng <men...@gmail.com> wrote:
>>>
>>> Ravi, we just merged https://issues.apache.org/jira/browse/SPARK-6642
>>> and used the same lambda scaling as in 1.2. The change will be
>>> included in Spark 1.3.1, which will be released soon. Thanks for
>>> reporting this issue! -Xiangrui
>>>
>>> On Tue, Mar 31, 2015 at 8:53 PM, Xiangrui Meng <men...@gmail.com> wrote:
>>> > I created a JIRA for this:
>>> > https://issues.apache.org/jira/browse/SPARK-6637. Since we don't have
>>> > a clear answer about how the scaling should be handled. Maybe the best
>>> > solution for now is to switch back to the 1.2 scaling. -Xiangrui
>>> >
>>> > On Tue, Mar 31, 2015 at 2:50 PM, Sean Owen <so...@cloudera.com> wrote:
>>> >> Ah yeah I take your point. The squared error term is over the whole
>>> >> user-item matrix, technically, in the implicit case. I suppose I am
>>> >> used to assuming that the 0 terms in this matrix are weighted so much
>>> >> less (because alpha is usually large-ish) that they're almost not
>>> >> there, but they are. So I had just used the explicit formulation.
>>> >>
>>> >> I suppose the result is kind of scale invariant, but not exactly. I
>>> >> had not prioritized this property since I had generally built models
>>> >> on the full data set and not a sample, and had assumed that lambda
>>> >> would need to be retuned over time as the input grew anyway.
>>> >>
>>> >> So, basically I don't know anything more than you do, sorry!
>>> >>
>>> >> On Tue, Mar 31, 2015 at 10:41 PM, Xiangrui Meng <men...@gmail.com>
>>> >> wrote:
>>> >>> Hey Sean,
>>> >>>
>>> >>> That is true for explicit model, but not for implicit. The ALS-WR
>>> >>> paper doesn't cover the implicit model. In implicit formulation, a
>>> >>> sub-problem (for v_j) is:
>>> >>>
>>> >>> min_{v_j} \sum_i c_ij (p_ij - u_i^T v_j)^2 + lambda * X * \|v_j\|_2^2
>>> >>>
>>> >>> This is a sum for all i but not just the users who rate item j. In
>>> >>> this case, if we set X=m_j, the number of observed ratings for item
>>> >>> j,
>>> >>> it is not really scale invariant. We have #users user vectors in the
>>> >>> least squares problem but only penalize lambda * #ratings. I was
>>> >>> suggesting using lambda * m directly for implicit model to match the
>>> >>> number of vectors in the least squares problem. Well, this is my
>>> >>> theory. I don't find any public work about it.
>>> >>>
>>> >>> Best,
>>> >>> Xiangrui
>>> >>>
>>> >>> On Tue, Mar 31, 2015 at 5:17 AM, Sean Owen <so...@cloudera.com>
>>> >>> wrote:
>>> >>>> I had always understood the formulation to be the first option you
>>> >>>> describe. Lambda is scaled by the number of items the user has rated
>>> >>>> /
>>> >>>> interacted with. I think the goal is to avoid fitting the tastes of
>>> >>>> prolific users disproportionately just because they have many
>>> >>>> ratings
>>> >>>> to fit. This is what's described in the ALS-WR paper we link to on
>>> >>>> the
>>> >>>> Spark web site, in equation 5
>>> >>>>
>>> >>>> (http://www.grappa.univ-lille3.fr/~mary/cours/stats/centrale/reco/paper/MatrixFactorizationALS.pdf)
>>> >>>>
>>> >>>> I think this also gets you the scale-invariance? For every
>>> >>>> additional
>>> >>>> rating from user i to product j, you add one new term to the
>>> >>>> squared-error sum, (r_ij - u_i . m_j)^2, but also, you'd increase
>>> >>>> the
>>> >>>> regularization term by lambda * (|u_i|^2 + |m_j|^2)  They are at
>>> >>>> least
>>> >>>> both increasing about linearly as ratings increase. If the
>>> >>>> regularization term is multiplied by the total number of users and
>>> >>>> products in the model, then it's fixed.
>>> >>>>
>>> >>>> I might misunderstand you and/or be speaking about something
>>> >>>> slightly
>>> >>>> different when it comes to invariance. But FWIW I had always
>>> >>>> understood the regularization to be multiplied by the number of
>>> >>>> explicit ratings.
>>> >>>>
>>> >>>> On Mon, Mar 30, 2015 at 5:51 PM, Xiangrui Meng <men...@gmail.com>
>>> >>>> wrote:
>>> >>>>> Okay, I didn't realize that I changed the behavior of lambda in
>>> >>>>> 1.3.
>>> >>>>> to make it "scale-invariant", but it is worth discussing whether
>>> >>>>> this
>>> >>>>> is a good change. In 1.2, we multiply lambda by the number ratings
>>> >>>>> in
>>> >>>>> each sub-problem. This makes it "scale-invariant" for explicit
>>> >>>>> feedback. However, in implicit feedback model, a user's sub-problem
>>> >>>>> contains all item factors. Then the question is whether we should
>>> >>>>> multiply lambda by the number of explicit ratings from this user or
>>> >>>>> by
>>> >>>>> the total number of items. We used the former in 1.2 but changed to
>>> >>>>> the latter in 1.3. So you should try a smaller lambda to get a
>>> >>>>> similar
>>> >>>>> result in 1.3.
>>> >>>>>
>>> >>>>> Sean and Shuo, which approach do you prefer? Do you know any
>>> >>>>> existing
>>> >>>>> work discussing this?
>>> >>>>>
>>> >>>>> Best,
>>> >>>>> Xiangrui
>>> >>>>>
>>> >>>>>
>>> >>>>> On Fri, Mar 27, 2015 at 11:27 AM, Xiangrui Meng <men...@gmail.com>
>>> >>>>> wrote:
>>> >>>>>> This sounds like a bug ... Did you try a different lambda? It
>>> >>>>>> would be
>>> >>>>>> great if you can share your dataset or re-produce this issue on
>>> >>>>>> the
>>> >>>>>> public dataset. Thanks! -Xiangrui
>>> >>>>>>
>>> >>>>>> On Thu, Mar 26, 2015 at 7:56 AM, Ravi Mody <rmody...@gmail.com>
>>> >>>>>> wrote:
>>> >>>>>>> After upgrading to 1.3.0, ALS.trainImplicit() has been returning
>>> >>>>>>> vastly
>>> >>>>>>> smaller factors (and hence scores). For example, the first few
>>> >>>>>>> product's
>>> >>>>>>> factor values in 1.2.0 are (0.04821, -0.00674,  -0.0325). In
>>> >>>>>>> 1.3.0, the
>>> >>>>>>> first few factor values are (2.535456E-8, 1.690301E-8,
>>> >>>>>>> 6.99245E-8). This
>>> >>>>>>> difference of several orders of magnitude is consistent
>>> >>>>>>> throughout both user
>>> >>>>>>> and product. The recommendations from 1.2.0 are subjectively much
>>> >>>>>>> better
>>> >>>>>>> than in 1.3.0. 1.3.0 trains significantly faster than 1.2.0, and
>>> >>>>>>> uses less
>>> >>>>>>> memory.
>>> >>>>>>>
>>> >>>>>>> My first thought is that there is too much regularization in the
>>> >>>>>>> 1.3.0
>>> >>>>>>> results, but I'm using the same lambda parameter value. This is a
>>> >>>>>>> snippet of
>>> >>>>>>> my scala code:
>>> >>>>>>> .....
>>> >>>>>>> val rank = 75
>>> >>>>>>> val numIterations = 15
>>> >>>>>>> val alpha = 10
>>> >>>>>>> val lambda = 0.01
>>> >>>>>>> val model = ALS.trainImplicit(train_data, rank, numIterations,
>>> >>>>>>> lambda=lambda, alpha=alpha)
>>> >>>>>>> .....
>>> >>>>>>>
>>> >>>>>>> The code and input data are identical across both versions. Did
>>> >>>>>>> anything
>>> >>>>>>> change between the two versions I'm not aware of? I'd appreciate
>>> >>>>>>> any help!
>>> >>>>>>>
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to