Yep! I actually want to recommend items of interest, where item depends on the context say for an online bookshop it is books. Few question regarding slope one. 1. Can I be applied to a binary data setting like mine? 2. Do we have an implementation for it in Mahout? 3. Will it scale well?
-----Original Message----- From: Sean Owen [mailto:[email protected]] Sent: Monday, January 19, 2009 5:36 PM To: [email protected] Subject: Re: RE: RE: [jira] Commented: (MAHOUT-19) Hierarchial clusterer Oh, so you are really recommending things like books, rather than URLs -- URL don't have anything directly to do with it? well then this is indeed a straightforward CF problem. my favorite CF algorithm at the moment is slope-one -- fast, good recommendations, and fairly resilient to noise. On Mon, Jan 19, 2009 at 11:44 AM, Goel, Ankur <[email protected]> wrote: > Not all URLs represent unique items / entities of interest. For e.g. a > lot of URLs would be just site specific search/listing pages or pages > that have a lot of navigational information but do not actually > represent an entity or item of interest. > > Given such a page we do not want to recommend links to items already on > the page but items that were far ahead (listing page 3, 4) and were also > liked most by the users on the site. > > Also for a URL that does represent a unique entity (For e.g. a book on > Amazon), we do not want to recommend other search/listing/navigational > pages but pages with actual items that people have liked w.r.t the > current page. > > The intent is to gauge the relative popularity or model the > co-occurrence of items with respect to each other and also remove the > anomalies. > > Lets say A = book1, C = listing-page, B=book2, D=book3 > > So if we have patterns like A-C-B, B-C-D-A, A-C-D-B, then A and B can be > both recommended for each other, given that one does not have the link > for the other already on the page. > > Whether or not Markov chain will work? I do not know as I need to read > about Markov chain and find out. > > As for log-likelihood ratio tests that sounds like a reasonable > candidate but I am a bit worried about scalability. > > Ted, what's your thought on this? > > Thanks > -Ankur
