On Tue, Sep 8, 2009 at 11:16 PM, Ted Dunning<[email protected]> wrote: > This can be describe using the method that I use by just moving > parentheses. The big off-line version of recommendation is (A' A) h while > the more more on-line version is A' (A h). The overall shape of the > computation is the same but the first form allows the majority of the > computation to be done off-line. > > My focus in the past has always been to produce recs really fast (sometimes > hundreds per second). As is generally the case, doing as much computation > ahead of time is a great way to get fast response.
What about incorporating new information at runtime? For example, thinking of the case of the first-time user who rates 3 things and then... waits until the next run of the offline process? That's my concern, along these lines. I completely agree that, if you can get away with offline computation, things can be much more efficient. I see a spectrum of needs developing -- from completely online to completely offline. I think we need more of the completely-offline stuff to support cases where there are no such requirements, and scale dictates it just has to be offline. > I tend to view requirements for lots of hooks and modifications as an > indication that the underlying algorithms are not phrased well. That isn't > always true, but it often is. Use of methods that depend on normal > distribution assumptions often have pathological performance for small > counts. This leads to lots of things like "if (n > 5)" sorts of hacks to > avoid problematic cases. These methods include most chi-squared techniques, > correlation based recommendation engines and LSA. The right answer is to > avoid the original mistake and use log-likelihood ratio tests, probabilistic > cooccurrence measures and LDA respectively. +1, have come to very much agree. Not all 'hooks' are bad though I am sure a matrix-based approach can accommodate a lot of things like this. So maybe let's visit the matrix approach a bit here -- what is A'A? is this the similarity matrix? working backwards, seems like that's the thing to left-multiply with the user rating vector to get recommendations. The first question I have is how does this cope with missing elements? I understand the idea is to use a sparse representation, but, the nature of the computation means these elements will be treated as zero, which doesn't work.
