Yes I think we're all thinking on the right lines but I think the narration needs some tweaking:
On Wed, Sep 9, 2009 at 11:42 PM, Gökhan Çapan <[email protected]> wrote: > Then (A' A)ij is a similarity weight between ith and jth items. > if Aij is the "rating of ith user for jth item", the highest value of > "ith row of A' A" is the most similar item for "ith item". > if the values in A are binary, then (A' A)ij is number of users who have > rated/clicked/viewed both item i and item j. Yes, well as Ted says, they aren't quite usable as weights in that form. There is some additional work to do: > 1. basic matrix approach: > > weight:=0; > for each user u > if u rated both X and Y weight+= rating(u,X)*rating(u,Y); One conceptual question I had is -- actually, in a matrix-based approach, unrated items 'appear' as actual 0 ratings, which doesn't work. So again there's a second reason the basic work isn't 'just' a dot product. So is this really the computation? > 2. Taste's implementation was: > > weight:=0; > for each user u > if rating(u,X) != 0 and rating(u,Y) != 0 > updateWeightWithChosenSimilarityMetric( ); Yes, more like it. Actually there's no "if rating is not zero" logic. "0" is a real, valid rating. The test would be "if X and Y have been rated". > 3. After the change: Same comment -- I don't think 'non-zero' is the right test. You are describing how to compute a single item-item similarity. Yes, the code was already taking advantage of sparsity as you say. The change you proposed was at one level higher in the item-based recommender computation. The change was to not compute this for all items in the model, but only for candidates that could possibly be recommended. So there's one loop outside this that would have been like 'for each item i that user has not rated' which is now a more complex rule that results in significantly fewer items. Anyway, like I said, it doesn't seem we're really doing a dot product to get the result matrix, but some other computation. But is it then a 'matrix multiplication'? The transformations Ted describes are computing the similarity metrics found in ItemSimilarity / UserSimilarity implementations... so the point I am getting at is, I wonder if what the code does now is actually equivalent to this pseudo-matrix-multiplication? I haven't thought it through, but I would hope and imagine that it's just an uglier (but I would hope faster, because special-purposed) version of the same computation. That would be good in that it would mean the current implementation isn't 'missing something'. But Ted's last message is the real point. A true matrix multiply can be parallelized. And true parallelization would be a useful addition to the set of tools we provide. Yes -- that second mapreduce is the real question! is there such a parallelizable black box that transforms the matrix multiply result into recommendations? I'm guessing there is indeed a matrix that makes the result into useful similarity weights, then this can be finished off by multiplying by the user rating vector. Now that works... my gut reaction is this is doing way more number crunching than the existing implementation does. Just producing A'A is going to do so much work. Yes I realize A is sparse but is the product? At scale that's waaay more number crunching. Is that the price of parallelizing? because indeed I don't see a way to parallelize other than to produce parallelized version of the simple, if literal and heavy, matrix operations you describe.
