I created a JIRA for this: https://issues.apache.org/jira/browse/SPARK-6637. Since we don't have a clear answer about how the scaling should be handled. Maybe the best solution for now is to switch back to the 1.2 scaling. -Xiangrui
On Tue, Mar 31, 2015 at 2:50 PM, Sean Owen <so...@cloudera.com> wrote: > Ah yeah I take your point. The squared error term is over the whole > user-item matrix, technically, in the implicit case. I suppose I am > used to assuming that the 0 terms in this matrix are weighted so much > less (because alpha is usually large-ish) that they're almost not > there, but they are. So I had just used the explicit formulation. > > I suppose the result is kind of scale invariant, but not exactly. I > had not prioritized this property since I had generally built models > on the full data set and not a sample, and had assumed that lambda > would need to be retuned over time as the input grew anyway. > > So, basically I don't know anything more than you do, sorry! > > On Tue, Mar 31, 2015 at 10:41 PM, Xiangrui Meng <men...@gmail.com> wrote: >> Hey Sean, >> >> That is true for explicit model, but not for implicit. The ALS-WR >> paper doesn't cover the implicit model. In implicit formulation, a >> sub-problem (for v_j) is: >> >> min_{v_j} \sum_i c_ij (p_ij - u_i^T v_j)^2 + lambda * X * \|v_j\|_2^2 >> >> This is a sum for all i but not just the users who rate item j. In >> this case, if we set X=m_j, the number of observed ratings for item j, >> it is not really scale invariant. We have #users user vectors in the >> least squares problem but only penalize lambda * #ratings. I was >> suggesting using lambda * m directly for implicit model to match the >> number of vectors in the least squares problem. Well, this is my >> theory. I don't find any public work about it. >> >> Best, >> Xiangrui >> >> On Tue, Mar 31, 2015 at 5:17 AM, Sean Owen <so...@cloudera.com> wrote: >>> I had always understood the formulation to be the first option you >>> describe. Lambda is scaled by the number of items the user has rated / >>> interacted with. I think the goal is to avoid fitting the tastes of >>> prolific users disproportionately just because they have many ratings >>> to fit. This is what's described in the ALS-WR paper we link to on the >>> Spark web site, in equation 5 >>> (http://www.grappa.univ-lille3.fr/~mary/cours/stats/centrale/reco/paper/MatrixFactorizationALS.pdf) >>> >>> I think this also gets you the scale-invariance? For every additional >>> rating from user i to product j, you add one new term to the >>> squared-error sum, (r_ij - u_i . m_j)^2, but also, you'd increase the >>> regularization term by lambda * (|u_i|^2 + |m_j|^2) They are at least >>> both increasing about linearly as ratings increase. If the >>> regularization term is multiplied by the total number of users and >>> products in the model, then it's fixed. >>> >>> I might misunderstand you and/or be speaking about something slightly >>> different when it comes to invariance. But FWIW I had always >>> understood the regularization to be multiplied by the number of >>> explicit ratings. >>> >>> On Mon, Mar 30, 2015 at 5:51 PM, Xiangrui Meng <men...@gmail.com> wrote: >>>> Okay, I didn't realize that I changed the behavior of lambda in 1.3. >>>> to make it "scale-invariant", but it is worth discussing whether this >>>> is a good change. In 1.2, we multiply lambda by the number ratings in >>>> each sub-problem. This makes it "scale-invariant" for explicit >>>> feedback. However, in implicit feedback model, a user's sub-problem >>>> contains all item factors. Then the question is whether we should >>>> multiply lambda by the number of explicit ratings from this user or by >>>> the total number of items. We used the former in 1.2 but changed to >>>> the latter in 1.3. So you should try a smaller lambda to get a similar >>>> result in 1.3. >>>> >>>> Sean and Shuo, which approach do you prefer? Do you know any existing >>>> work discussing this? >>>> >>>> Best, >>>> Xiangrui >>>> >>>> >>>> On Fri, Mar 27, 2015 at 11:27 AM, Xiangrui Meng <men...@gmail.com> wrote: >>>>> This sounds like a bug ... Did you try a different lambda? It would be >>>>> great if you can share your dataset or re-produce this issue on the >>>>> public dataset. Thanks! -Xiangrui >>>>> >>>>> On Thu, Mar 26, 2015 at 7:56 AM, Ravi Mody <rmody...@gmail.com> wrote: >>>>>> After upgrading to 1.3.0, ALS.trainImplicit() has been returning vastly >>>>>> smaller factors (and hence scores). For example, the first few product's >>>>>> factor values in 1.2.0 are (0.04821, -0.00674, -0.0325). In 1.3.0, the >>>>>> first few factor values are (2.535456E-8, 1.690301E-8, 6.99245E-8). This >>>>>> difference of several orders of magnitude is consistent throughout both >>>>>> user >>>>>> and product. The recommendations from 1.2.0 are subjectively much better >>>>>> than in 1.3.0. 1.3.0 trains significantly faster than 1.2.0, and uses >>>>>> less >>>>>> memory. >>>>>> >>>>>> My first thought is that there is too much regularization in the 1.3.0 >>>>>> results, but I'm using the same lambda parameter value. This is a >>>>>> snippet of >>>>>> my scala code: >>>>>> ..... >>>>>> val rank = 75 >>>>>> val numIterations = 15 >>>>>> val alpha = 10 >>>>>> val lambda = 0.01 >>>>>> val model = ALS.trainImplicit(train_data, rank, numIterations, >>>>>> lambda=lambda, alpha=alpha) >>>>>> ..... >>>>>> >>>>>> The code and input data are identical across both versions. Did anything >>>>>> change between the two versions I'm not aware of? I'd appreciate any >>>>>> help! >>>>>> --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org