On Tue, Sep 8, 2009 at 11:16 PM, Ted Dunning<[email protected]> wrote:
> This can be describe using the method that I use by just moving
> parentheses.  The big off-line version of recommendation is (A' A) h while
> the more more on-line version is A' (A h).  The overall shape of the
> computation is the same but the first form allows the majority of the
> computation to be done off-line.
>
> My focus in the past has always been to produce recs really fast (sometimes
> hundreds per second).  As is generally the case, doing as much computation
> ahead of time is a great way to get fast response.

What about incorporating new information at runtime? For example,
thinking of the case of the first-time user who rates 3 things and
then... waits until the next run of the offline process? That's my
concern, along these lines.

I completely agree that, if you can get away with offline computation,
things can be much more efficient.

I see a spectrum of needs developing -- from completely online to
completely offline. I think we need more of the completely-offline
stuff to support cases where there are no such requirements, and scale
dictates it just has to be offline.

> I tend to view requirements for lots of hooks and modifications as an
> indication that the underlying algorithms are not phrased well.  That isn't
> always true, but it often is.  Use of methods that depend on normal
> distribution assumptions often have pathological performance for small
> counts.  This leads to lots of things like "if (n > 5)" sorts of hacks to
> avoid problematic cases.  These methods include most chi-squared techniques,
> correlation based recommendation engines and LSA.  The right answer is to
> avoid the original mistake and use log-likelihood ratio tests, probabilistic
> cooccurrence measures and LDA respectively.

+1, have come to very much agree. Not all 'hooks' are bad though I am
sure a matrix-based approach can accommodate a lot of things like
this.


So maybe let's visit the matrix approach a bit here -- what is A'A? is
this the similarity matrix? working backwards, seems like that's the
thing to left-multiply with the user rating vector to get
recommendations.

The first question I have is how does this cope with missing elements?
I understand the idea is to use a sparse representation, but, the
nature of the computation means these elements will be treated as
zero, which doesn't work.

Reply via email to