ith row of the matrix A'A contains all items and their similarity degrees to the item that is represented at ith column of the matrix A. I guess it is enough using only a subset of A'A at the final step, that is, the rows which represent the items that are in active user's history. btw, I also want to contribute to that implementation, if we can decide the algorithm.
On Fri, Dec 4, 2009 at 10:33 AM, Sean Owen <[email protected]> wrote: > Yes, this makes sense. I do need two passes. One pass converts input > from "user,item,rating" triples into user vectors. Then the second > step builds the co-occurrence A'A product. I agree that it will be > faster to take a shortcut than properly compute A'A. > > (Though I'm curious how this works -- looks deceptively easy, this > outer product approach. Isn't v cross v potentially huge? or likely to > be sparse enough to not matter) > > I understand the final step in principle, which is to compute (A'A)h. > But I keep guessing A'A is too big to fit in memory? So I can > side-load the rows of A'A one at a time and compute it rather > manually. > > > On Thu, Dec 3, 2009 at 8:28 PM, Ted Dunning <[email protected]> wrote: > > I think you can merge my passes into a single pass in which you compute > the > > row and column sums at the same time that you compute the product. That > is > > more complicated, though, and I hate fancy code. So you are right in > > practice that I have always had two passes. (although pig might be > clever > > enough by now to merge them) > > > > There is another pass in which you use all of the sums to do the > > sparsification. I don't know if that could be done in the same pass or > not. > -- Gökhan Çapan
