Hello, Thanks for the pointers, Grant. Regarding that Amazon item-item recommendation. It looks like that's patented:
http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=2&p=1&f=G&l=50&d=PTXT&S1=amazon.ASNM.&OS=AN/(amazon)&RS=AN/amazon Does that mean one cannot implement this in Taste (or any other piece of software)? Even if used in non-shopping purposes? Thanks, Otis ----- Original Message ---- > From: Grant Ingersoll <[EMAIL PROTECTED]> > To: [email protected] > Sent: Monday, September 29, 2008 9:43:58 AM > Subject: Re: Recommending when working with binary data sets > > Not sure I know the answer in terms of Taste, but did a little bit of > digging (mind you, I'm no CF expert, but I'm learning thanks to Taste > and Sean). > > At any rate, came across: > Started at Wikipedia's page: > http://en.wikipedia.org/wiki/Collaborative_filtering > > which lead to http://en.wikipedia.org/wiki/Slope_One, which then has > an interesting comment about Amazon's item-item approach, which, via > Google Scholar leads to: > > http://dsonline.computer.org/portal/site/dsonline/menuitem.9ed3d9924aeb0dcd82ccc6716bbe36ec/index.jsp?&pName=dso_level1&path=dsonline/2003_Archives/0301/d&file=wp1lind.xml&xsl=article.xsl&;jsessionid=LghY1grHgYJpBTLpWjX5NtvQwhH1Bkv9rpfXT4VnpVtDNVpfZ8n0!-1404507079 > > In particular, see the "How it Works" section. Essentially, it > describes how they build the item to item similarity matrix, which I > believe is also what you need to do. > > HTH, > Grant > > On Sep 26, 2008, at 1:52 PM, Otis Gospodnetic wrote: > > > Hi, > > > > I've been reading the chapter on recommendations in Programming > > Collective Intelligence and looking at Taste. The examples in PCI > > all assume scenarios where items to recommend have been rated by > > users on some scale. I understand how items can be recommended to > > users using item-based filtering and user-item ratings and why this > > is preferred over user-based filtering when the number of users is > > larger than the number of items. > > But what if all I've got is item-item similarity (content-based) and > > there are no user-item ratings? Say I have a situation where people > > simply either consume content (e.g. read an article, watch a > > video...) or not consume it (don't read an article, don't watch the > > video...). In other words, I really have only yes/no or 1/0 or seen/ > > not seen type "rating". > > > > I can't really use Euclidean distance or Pearson correlation > > coefficient, can I? > > > > What do people use in such scenarios? Would it make sense to use > http://en.wikipedia.org/wiki/Jaccard_index > > for such cases? > > ... Ah, I do see javadoc in TanimotoCoefficientSimilarity saying > > exactly that, good. > > > > But then my question is: > > Doesn't the use of Jaccard/Tanimoto mean going back to the expensive > > user-user similarity computation? > > > > That is, if I need to recommend items for user U1 don't I need to: > > 1) have user-user similarity pre-computed (and recomputed > > periodically) > > 2) find top N users U{2,3,4,...N} who are the most similar to U1 > > 3) then for these top N most similar users find their "seen" items > > that U1 has not seen (possibly limit this to only recently seen items) > > 4) select top N items from 3) and recommend those to U1. > > > > If so, then 1) is again expensive. > > And what how would one go about selecting top N items from the list > > in this case other than ordering them by user-user similarity? > > > > Of course, something is telling me I'm demonstrating that I don't > > yet have the full grasp of item-based filtering. I hope that's the > > case! :) > > > > Thanks, > > Otis > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ
