Nice, good wisdom here.
I agree about the appeal and problems of thinking of item-attribute pairs as your items. You're saying content-based recommendation, in practice, is often a matter of substituting one dominant item attribute in place of items -- recommending on artist, rather than artist track. OK, check, one can do that in the current framework by using artists as items. So I think that's supported for free. And maybe my other notion of a way to bring content-based recommendation into the framework -- some organized framework for constructing and tuning a notion of item similarity based on attributes -- also has merit and belongs in the category of "content-based" techniques. I ask because there's been some request to talk more about content-based recommendation and so I want to build this out more. On Tue, Jan 26, 2010 at 11:36 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > I define it a bit differently by redefining recommendations as machine > learning. > > Users have preferences for objects with attributes. > > We would like to learn from all user/object/attribute preference data to > predict so-far unobserved preferences of a user for other objects. > > Normal recommendations is a subset of this where there is exactly one id > attribute for every object. > > We can extend most recommendation algorithms to this new paradigm relatively > transparently by considering each expressed item preference to be a bundle > of attribute preferences. Our recommendation algorithm needs to produce a > list of recommended attributes which we integrate into a list of recommended > items. The list of recommended attributes might be segregated into a list > of values for each kind of attribute or it might be in a single list. The > segregated approach could just replicate a recommendation engine per > attribute type. The combined approach might just label all attributes and > throw them into a soup of preference data. > > The additional code needed consists mostly of writing the code that > integrates the attribute recommendations into a list of item > recommendations. This can be as simple as weighting the recommended > attributes by rank and doing rankScore * idf retrieval to find the items. > Some algorithms like LDA have the ability to explicitly integrate the > different kinds of attributes. Others really don't. > > One problem with this is that you are exploding the number of preferences > which can present scaling and noise problems. You also inherently > intermingle attributes with very different distributional characteristics > together. For instance, there might only be a dozen or so colors of shoes > and thus the number of people who have expressed a preference for some kind > of red shoe is going to be vastly larger than the number of people who have > expressed a preference for a specific color of a specific size of a specific > model of a shoe. It is common for recommendation systems to fail for very > common things or for very rare things and integrating both pathological > situations in a single recommendation framework may be a problem. > > My own experience with this is that it is common for one kind of attribute > to dominates the recommendation process in the sense of providing the most > oomph and accuracy. This can be because the data is sparse and some > attribute provide useful smoothing or it can be that some attributes are too > general and other attributes provide more precision. At Musicmatch, for > instance, the artist attribute provided a disproportionate share of music > recommendation value above track or album or even song (track != song > because it is common for the same song to be on many albums giving many > tracks). I think that this must only be true to first order and that if you > dig in, you would find minority classes where different attributes provide > different amounts of data, but it is rare in startups to get past the first > order solution.