On Mon, Jul 13, 2009 at 9:21 AM, Sean Owen <[email protected]> wrote: > It would be interesting to see how it scales > indeed. >
It scales very well. At Veoh we were serving about 400 queries per second at one point. This included searches and recommendations, but I think I remember that one time more than half were recs. > This doesn't include a notion of item ratings (well, maybe the > "documents" can include the item tokens several times to indicate a > stronger association) but that is not a necessary condition for good > recommendations. > Actually it does. That is in the off-line part. But, as you likely know by now, I am an anti-fan of using ratings for recommendations. I think that the data is suspect and is generally about two orders of magnitude smaller than other viewing data. Given that it is lower quality and vastly smaller, I see no utility in actually spending thought on using that kind of data. Often you can use that data for free, but that is the only price I would pay. This is not the same as saying you should not allow users to rate things and share ratings and so on. Users enjoy doing that. I just think that the data is next to useless compared to the alternatives. > I think the equivalent in CF is a combination of 1) > an item-based recommender and 2) the log-likelihood similarity metric. > Indeed. And the lucene based recommender effectively uses (2) twice. First in the off-line reduction of data, second in the implicit weighting performed by lucene. It is also useful to note that it is a piece of cake to integrate various search functions into this kind of architecture. Thus, filtering recommendations by some boolean constraint, or tainting them with a textual query or recency preference is literally trivial.
