This makes clear the real problem with considering ratings as numerical
values.  In my experience, a reasonable ordering of similarities between two
ratings from different users on the same item is like this:

     both rated the item high
     both rated it low
     one rated high, the other low
     one rated the item, the other did not

I would content that having opposite ratings is actually a sign of
similarity between the users tastes rather than a sign of dis-similarity.
It is a weaker sign of similarity than an identical rating, but it is still
an indication of similarity.  This is because users rate items low if they
are close to what they like, but the item somehow pissed them off by
violating their expectations.  Items that are far from their tastes, they
simply don't rate.

Moreover, since negative ratings are relatively rare (typically 5-10% of the
number of high ratings), the both-low condition is rarer still.  You
therefore actually get more mileage out of the high-low discordance.

My conclusion from this is that viable strategies include:

a) use only high ratings and do binary recommendations

b) use any rating at all as a binary recommendation

c) do both (a) and (b) and blend the results with more emphasis on (a).

On Thu, Feb 11, 2010 at 12:41 PM, Sean Owen <[email protected]> wrote:

> Yes, you are right. It seems counter-intuitive, at first. I might
> argue it is not so counter-intuitive, however.
>
> A similarity of -1 is low, as low as possible. But the fact that the
> two items have any similarity at all is significant. It means there
> are a number of users who have rated both items, although they have
> rated them quite differently. Note that most pairs of items have no
> similarity whatsoever.
>
> So a similarity of -1 is still in a way significant. The result is
> less counter-intuitive when you think of it this way.
>
> On Thu, Feb 11, 2010 at 8:35 PM, Guohua Hao <[email protected]> wrote:
> > Hello Sean,
> >
> > First, I like your tweaks there.
> >
> > Based on your example, I came up with a new extreme case, which may cause
> > some trouble. Suppose the user u has rated several items (e.g., 10 items)
> > all with rating 5, and we want to predict user u's rating for item i,
> P_{u,
> > i}. If item i 's similarities with all those already rated items are the
> > same, which are very close to -1, we are still going to get P_{u,i} = 5,
> > because those similarities factors will be canceled out. However, there
> is
> > still counter-intuitive, since we expect P_{u, i} to be very close to 1 (
> in
> > the 1-5 rating range) with more confidence.
> >
> > Shall we consider this case in the code?
> >
> > Thanks,
> > Guohua
> >
> > On Wed, Feb 10, 2010 at 6:13 PM, Sean Owen <[email protected]> wrote:
> >
> >> Yes, great point. It's bad if there's only one item that the user has
> >> rated that has any similarity to the item being predicted. According
> >> to even the 'corrected' formula, the similarity value doesn't even
> >> matter. It cancels out. That leads to the counter-intuitive
> >> possibility you highlight.
> >>
> >> For that reason GenericItemBasedRecommender won't make a prediction in
> >> this situation. You could argue it's a hack but I feel it should be
> >> undefined in this situation.
> >>
> >> You could certainly throw out 3.2.1 entirely and think up something
> >> better, though I think with the two tweaks I've described here, its
> >> core logic is simple and remains sound.
> >>
> >> Sean
> >>
> >>
> >> On Thu, Feb 11, 2010 at 12:04 AM, Guohua Hao <[email protected]>
> wrote:
> >> > I think you brought up a good point as to dealing with negative
> >> > similarities, which I have not realized before. Here is my other
> thought.
> >> > Based on your example and the proposed method, we will get a predicted
> >> > rating of 5 in such case after normalization. This seems
> >> counter-intuitive
> >> > to me, since we know that these two items are very dissimilar
> (actually
> >> > opposite correlated), a predicted rating close to 1 will be more
> >> intuitive
> >> > to me. Maybe we need to think more about the expression in section
> 3.2.1
> >> of
> >> > that paper.
> >>
> >
>



-- 
Ted Dunning, CTO
DeepDyve

Reply via email to