Re: Recommending when working with binary data sets

Sean Owen Wed, 01 Oct 2008 09:36:07 -0700

On Wed, Oct 1, 2008 at 3:49 PM, Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
> If I understand the article correctly, they go through several possible 
> approaches to building REs - user-item CF -- too expensive, user clustering 
> --- low quality, search-based -- low quality and then present their novel 
> item-to-item CF as something that doesn't have performance/scale/quality 
> problems.  I think they said it works as my example above, but it doesn't 
> make sense to me - by doing the above all I'm doing is looking at similarity 
> between items that Joe already consumed, and I'm really after *new* things to 
> recommend.


I have not seen this before. I think I just read somewhere that the
item-based system I outlined (as does Collective Intelligence) is
"what Amazon does" and that could be wrong.

They first say, basically, compute only as much of the item-item
similarity matrix as makes sense -- if nobody ever bought X and Y
together, don't bother computing it. OK. As far as my stuff goes --
similarities are not all precomputed in the interest of time but are
cached as much as memory permits. Taking it to an extreme and having a
system that can cache it all -- yeah, that's going to help speed!

It says...

"Given a similar-items table, the algorithm finds items similar to
each of the user's purchases and ratings, aggregates those items, and
then recommends the most popular or correlated items."

It's not clear what exactly this means, whether the top
recommendations are merely items that are most similar to *something*
you bought/rated, or most similar *on average to everything you
bought/rated*, or is a weighted average like we were discussing. I
kinda think it has to be the latter for two reasons:

1) If I hated something, I don't want to recommend things like it! so
rating seems to matter
2) Taking into account rating does not appreciably slow anything down


>> > Is this interpretation correct?  If it is, I don't quite yet see the 
>> > benefit
>> of computing similarity between items for a single person.  Of course they 
>> will
>> be similar, unless we want to create groups of items for a single person 
>> with a
>> diverse set of interests for some reason.

Oh, no I think they are saying they compute similarity between all
pairs of items in the catalog -- well, all pairs that actually
co-occur in some user's purchase/rating history.


> In my case I'm interested in working with binary preferences, though, so if I 
> understand things correctly weighted average doesn't really make sense in 
> that context.  Is this correct?

Yeah, er, it still works to an extent but not really meaningfully. It
seems that other approaches are more appropriate and faster in this
special case. This, I haven't dug into much. CI covers it reasonably
well. You or I could sit down and adapt the common approaches to this
special case in any number of clever ways, I'm sure.


> This sounds exactly like what's described on pages 22-25 in Programming 
> Collective Intelligence.  My understanding was that what's in PCI is "an old 
> known approach" whereas what is in that article about Amazon is a "new, 
> different, and patented approach".  Are they both actually talking about the 
> same algo? i.e. Amazon article is the same as what's on p22-25 in PCI?

This I don't know. This was my impression, that they're basically
doing plain ol' item-based recommendation -- with a heap of tweaks and
heuristics and so on internally, no doubt, that they aren't going to
publish. For example, one area where one can add a lot of proprietary
value is translating user actions (purchases, ratings, views) into the
"right" inferred preference value.


> OK, so the summary is that you still have to compute item1-item2 similarity 
> matrix where item1 comes from the set of items consumed by Joe and item2 
> comes from the set of items I want to consider for recommendation (i.e. it 
> can be only a subset of all items, e.g. only new items since the last time 
> Joe was recommended some items 3 days ago).
>
> If my assumption that computing weighted average doesn't make sense in the 
> context of binary preferences, then the above is really it - it all boils 
> down to computing item1-item2 matrix based on item-item similarity.
> While doing that we try to make that item1-item2 as small as possible and try 
> to compute it ahead of time, so that at the "give Joe N recommendations" time 
> we can simply take topN from that1-item2 matrix.
> Is this correct or am I missing some pieces of the puzzle?

Yeah that is my understanding. They build as much of the complete
item-item matrix as makes sense ahead of time.

The only issue with that is, well, you can't just go update it every
time a pref changes. So either you don't immediately respond to new
ratings, or you have to do so in some clever other way.

For my stuff, everything has an nearly immediate effect -- updates
take effect immediately but due to caching may not percolate up the
recommender system and have an effect for a bit. This is a sort of
middle ground between recomputing everything all the time, and never.
(And it is still a problematic way to approach this.)

Re: Recommending when working with binary data sets

Reply via email to