Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0

Sean Owen Tue, 31 Mar 2015 05:18:13 -0700

I had always understood the formulation to be the first option you
describe. Lambda is scaled by the number of items the user has rated /
interacted with. I think the goal is to avoid fitting the tastes of
prolific users disproportionately just because they have many ratings
to fit. This is what's described in the ALS-WR paper we link to on the
Spark web site, in equation 5
(http://www.grappa.univ-lille3.fr/~mary/cours/stats/centrale/reco/paper/MatrixFactorizationALS.pdf)


I think this also gets you the scale-invariance? For every additional
rating from user i to product j, you add one new term to the
squared-error sum, (r_ij - u_i . m_j)^2, but also, you'd increase the
regularization term by lambda * (|u_i|^2 + |m_j|^2)  They are at least
both increasing about linearly as ratings increase. If the
regularization term is multiplied by the total number of users and
products in the model, then it's fixed.

I might misunderstand you and/or be speaking about something slightly
different when it comes to invariance. But FWIW I had always
understood the regularization to be multiplied by the number of
explicit ratings.

On Mon, Mar 30, 2015 at 5:51 PM, Xiangrui Meng <men...@gmail.com> wrote:
> Okay, I didn't realize that I changed the behavior of lambda in 1.3.
> to make it "scale-invariant", but it is worth discussing whether this
> is a good change. In 1.2, we multiply lambda by the number ratings in
> each sub-problem. This makes it "scale-invariant" for explicit
> feedback. However, in implicit feedback model, a user's sub-problem
> contains all item factors. Then the question is whether we should
> multiply lambda by the number of explicit ratings from this user or by
> the total number of items. We used the former in 1.2 but changed to
> the latter in 1.3. So you should try a smaller lambda to get a similar
> result in 1.3.
>
> Sean and Shuo, which approach do you prefer? Do you know any existing
> work discussing this?
>
> Best,
> Xiangrui
>
>
> On Fri, Mar 27, 2015 at 11:27 AM, Xiangrui Meng <men...@gmail.com> wrote:
>> This sounds like a bug ... Did you try a different lambda? It would be
>> great if you can share your dataset or re-produce this issue on the
>> public dataset. Thanks! -Xiangrui
>>
>> On Thu, Mar 26, 2015 at 7:56 AM, Ravi Mody <rmody...@gmail.com> wrote:
>>> After upgrading to 1.3.0, ALS.trainImplicit() has been returning vastly
>>> smaller factors (and hence scores). For example, the first few product's
>>> factor values in 1.2.0 are (0.04821, -0.00674,  -0.0325). In 1.3.0, the
>>> first few factor values are (2.535456E-8, 1.690301E-8, 6.99245E-8). This
>>> difference of several orders of magnitude is consistent throughout both user
>>> and product. The recommendations from 1.2.0 are subjectively much better
>>> than in 1.3.0. 1.3.0 trains significantly faster than 1.2.0, and uses less
>>> memory.
>>>
>>> My first thought is that there is too much regularization in the 1.3.0
>>> results, but I'm using the same lambda parameter value. This is a snippet of
>>> my scala code:
>>> .....
>>> val rank = 75
>>> val numIterations = 15
>>> val alpha = 10
>>> val lambda = 0.01
>>> val model = ALS.trainImplicit(train_data, rank, numIterations,
>>> lambda=lambda, alpha=alpha)
>>> .....
>>>
>>> The code and input data are identical across both versions. Did anything
>>> change between the two versions I'm not aware of? I'd appreciate any help!
>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Implicit matrix factorization returning different results between spark 1.2.0 and 1.3.0

Reply via email to