Re: Recommending when working with binary data sets

Otis Gospodnetic Tue, 30 Sep 2008 10:09:34 -0700

Hi,



----- Original Message ----
> From: Sean Owen <[EMAIL PROTECTED]>
> Subject: Re: Recommending when working with binary data sets
> 
> Sorry for the late reply -- I've been traveling.

How dare you! ;)

> On Fri, Sep 26, 2008 at 6:52 PM, Otis Gospodnetic
> wrote:
> > I've been reading the chapter on recommendations in Programming Collective 
> Intelligence and looking at Taste.  The examples in PCI
> 
> (PS that is a really good book. Recommended -- highly recommended --
> to everyone involved with Mahout. I kinda cross-checked what I had
> done against the book and think it agrees. The book suggested more
> good ideas, particularly the Tanimoto coefficient business.)

Yes, the book is nice, except the printing I got (must be the original one - 
Amazon must have sent me something from their 'old print' pile) is full of 
mistakes, see 
http://oreilly.com/catalog/9780596529321/errata/9780596529321.unconfirmed

> > I can't really use Euclidean distance or Pearson correlation coefficient, 
> > can 
> I?
> 
> You could but it wouldn't make much sense. In the framework I do have
> an implementation of Preference which is supposed to encapsulate a
> binary value like this. Its existence means a 'yes' and as far as the
> framework is concerned means the user expresses a '1.0' preference for
> the item. That value doesn't really matter.
> 
> (and yes, it would be more efficient to not have such a simple dummy
> implementation of Preference to represent this. I threw it in since it
> fits cleanly in the framework. Get it right first -- then make it
> fast. If there is interest in these areas then we start making more
> customized versions of User and some of the algorithms that take
> advantage of the fact that preferences are binary.)

I think it would be valuable to put effort into making binary preferences work 
well.  Asking for item rating is always a problem (extra user action, 
distraction, low participation), so one often has to gather data through 
behaviour observation.

> > What do people use in such scenarios?  Would it make sense to use 
> http://en.wikipedia.org/wiki/Jaccard_index for such cases?
> > ... Ah, I do see javadoc in TanimotoCoefficientSimilarity saying exactly 
> > that, 
> good.
> >
> > But then my question is:
> > Doesn't the use of Jaccard/Tanimoto mean going back to the expensive 
> > user-user 
> similarity computation?
> 
> TanimotoCoefficientSimilarity implements both UserSimilarity and
> ItemSimilarity, so it can be plugged into either a user-based or
> item-based recommender, which need a UserSimilarity or ItemSimilarity,
> respectively. So, no, you aren't forced to user-based recommenders in
> this context.

Thanks.  I'll have to give it a try.

Otis

Re: Recommending when working with binary data sets

Reply via email to