Hi,
----- Original Message ---- > From: Sean Owen <[EMAIL PROTECTED]> > Subject: Re: Recommending when working with binary data sets > > Sorry for the late reply -- I've been traveling. How dare you! ;) > On Fri, Sep 26, 2008 at 6:52 PM, Otis Gospodnetic > wrote: > > I've been reading the chapter on recommendations in Programming Collective > Intelligence and looking at Taste. The examples in PCI > > (PS that is a really good book. Recommended -- highly recommended -- > to everyone involved with Mahout. I kinda cross-checked what I had > done against the book and think it agrees. The book suggested more > good ideas, particularly the Tanimoto coefficient business.) Yes, the book is nice, except the printing I got (must be the original one - Amazon must have sent me something from their 'old print' pile) is full of mistakes, see http://oreilly.com/catalog/9780596529321/errata/9780596529321.unconfirmed > > I can't really use Euclidean distance or Pearson correlation coefficient, > > can > I? > > You could but it wouldn't make much sense. In the framework I do have > an implementation of Preference which is supposed to encapsulate a > binary value like this. Its existence means a 'yes' and as far as the > framework is concerned means the user expresses a '1.0' preference for > the item. That value doesn't really matter. > > (and yes, it would be more efficient to not have such a simple dummy > implementation of Preference to represent this. I threw it in since it > fits cleanly in the framework. Get it right first -- then make it > fast. If there is interest in these areas then we start making more > customized versions of User and some of the algorithms that take > advantage of the fact that preferences are binary.) I think it would be valuable to put effort into making binary preferences work well. Asking for item rating is always a problem (extra user action, distraction, low participation), so one often has to gather data through behaviour observation. > > What do people use in such scenarios? Would it make sense to use > http://en.wikipedia.org/wiki/Jaccard_index for such cases? > > ... Ah, I do see javadoc in TanimotoCoefficientSimilarity saying exactly > > that, > good. > > > > But then my question is: > > Doesn't the use of Jaccard/Tanimoto mean going back to the expensive > > user-user > similarity computation? > > TanimotoCoefficientSimilarity implements both UserSimilarity and > ItemSimilarity, so it can be plugged into either a user-based or > item-based recommender, which need a UserSimilarity or ItemSimilarity, > respectively. So, no, you aren't forced to user-based recommenders in > this context. Thanks. I'll have to give it a try. Otis
