On Mon, Jul 13, 2009 at 9:21 AM, Sean Owen <[email protected]> wrote:

> It would be interesting to see how it scales
> indeed.
>

It scales very well.  At Veoh we were serving about 400 queries per second
at one point.  This included searches and recommendations, but I think I
remember that one time more than half were recs.


> This doesn't include a notion of item ratings (well, maybe the
> "documents" can include the item tokens several times to indicate a
> stronger association) but that is not a necessary condition for good
> recommendations.
>

Actually it does.  That is in the off-line part.

But, as you likely know by now, I am an anti-fan of using ratings for
recommendations.  I think that the data is suspect and is generally about
two orders of magnitude smaller than other viewing data.  Given that it is
lower quality and vastly smaller, I see no utility in actually spending
thought on using that kind of data.  Often you can use that data for free,
but that is the only price I would pay.

This is not the same as saying you should not allow users to rate things and
share ratings and so on.  Users enjoy doing that.  I just think that the
data is next to useless compared to the alternatives.


> I think the equivalent in CF is a combination of 1)
> an item-based recommender and 2) the log-likelihood similarity metric.
>

Indeed.  And the lucene based recommender effectively uses (2) twice.  First
in the off-line reduction of data, second in the implicit weighting
performed by lucene.

It is also useful to note that it is a piece of cake to integrate various
search functions into this kind of architecture.  Thus, filtering
recommendations by some boolean constraint, or tainting them with a textual
query or recency preference is literally trivial.

Reply via email to