Nice, good wisdom here.

I agree about the appeal and problems of thinking of item-attribute
pairs as your items.

You're saying content-based recommendation, in practice, is often a
matter of substituting one dominant item attribute in place of items
-- recommending on artist, rather than artist track. OK, check, one
can do that in the current framework by using artists as items. So I
think that's supported for free.

And maybe my other notion of a way to bring content-based
recommendation into the framework -- some organized framework for
constructing and tuning a notion of item similarity based on
attributes -- also has merit and belongs in the category of
"content-based" techniques.


I ask because there's been some request to talk more about
content-based recommendation and so I want to build this out more.


On Tue, Jan 26, 2010 at 11:36 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> I define it a bit differently by redefining recommendations as machine
> learning.
>
> Users have preferences for objects with attributes.
>
> We would like to learn from all user/object/attribute preference data to
> predict so-far unobserved preferences of a user for other objects.
>
> Normal recommendations is a subset of this where there is exactly one id
> attribute for every object.
>
> We can extend most recommendation algorithms to this new paradigm relatively
> transparently by considering each expressed item preference to be a bundle
> of attribute preferences.  Our recommendation algorithm needs to produce a
> list of recommended attributes which we integrate into a list of recommended
> items.  The list of recommended attributes might be segregated into a list
> of values for each kind of attribute or it might be in a single list.  The
> segregated approach could just replicate a recommendation engine per
> attribute type.  The combined approach might just label all attributes and
> throw them into a soup of preference data.
>
> The additional code needed consists mostly of writing the code that
> integrates the attribute recommendations into a list of item
> recommendations.  This can be as simple as weighting the recommended
> attributes by rank and doing rankScore * idf retrieval to find the items.
> Some algorithms like LDA have the ability to explicitly integrate the
> different kinds of attributes.  Others really don't.
>
> One problem with this is that you are exploding the number of preferences
> which can present scaling and noise problems.  You also inherently
> intermingle attributes with very different distributional characteristics
> together.  For instance, there might only be a dozen or so colors of shoes
> and thus the number of people who have expressed a preference for some kind
> of red shoe is going to be vastly larger than the number of people who have
> expressed a preference for a specific color of a specific size of a specific
> model of a shoe.  It is common for recommendation systems to fail for very
> common things or for very rare things and integrating both pathological
> situations in a single recommendation framework may be a problem.
>
> My own experience with this is that it is common for one kind of attribute
> to dominates the recommendation process in the sense of providing the most
> oomph and accuracy.  This can be because the data is sparse and some
> attribute provide useful smoothing or it can be that some attributes are too
> general and other attributes provide more precision.  At Musicmatch, for
> instance, the artist attribute provided a disproportionate share of music
> recommendation value above track or album or even song (track != song
> because it is common for the same song to be on many albums giving many
> tracks).  I think that this must only be true to first order and that if you
> dig in, you would find minority classes where different attributes provide
> different amounts of data, but it is rare in startups to get past the first
> order solution.

Reply via email to