Yes, you are right. It seems counter-intuitive, at first. I might argue it is not so counter-intuitive, however.
A similarity of -1 is low, as low as possible. But the fact that the two items have any similarity at all is significant. It means there are a number of users who have rated both items, although they have rated them quite differently. Note that most pairs of items have no similarity whatsoever. So a similarity of -1 is still in a way significant. The result is less counter-intuitive when you think of it this way. On Thu, Feb 11, 2010 at 8:35 PM, Guohua Hao <[email protected]> wrote: > Hello Sean, > > First, I like your tweaks there. > > Based on your example, I came up with a new extreme case, which may cause > some trouble. Suppose the user u has rated several items (e.g., 10 items) > all with rating 5, and we want to predict user u's rating for item i, P_{u, > i}. If item i 's similarities with all those already rated items are the > same, which are very close to -1, we are still going to get P_{u,i} = 5, > because those similarities factors will be canceled out. However, there is > still counter-intuitive, since we expect P_{u, i} to be very close to 1 ( in > the 1-5 rating range) with more confidence. > > Shall we consider this case in the code? > > Thanks, > Guohua > > On Wed, Feb 10, 2010 at 6:13 PM, Sean Owen <[email protected]> wrote: > >> Yes, great point. It's bad if there's only one item that the user has >> rated that has any similarity to the item being predicted. According >> to even the 'corrected' formula, the similarity value doesn't even >> matter. It cancels out. That leads to the counter-intuitive >> possibility you highlight. >> >> For that reason GenericItemBasedRecommender won't make a prediction in >> this situation. You could argue it's a hack but I feel it should be >> undefined in this situation. >> >> You could certainly throw out 3.2.1 entirely and think up something >> better, though I think with the two tweaks I've described here, its >> core logic is simple and remains sound. >> >> Sean >> >> >> On Thu, Feb 11, 2010 at 12:04 AM, Guohua Hao <[email protected]> wrote: >> > I think you brought up a good point as to dealing with negative >> > similarities, which I have not realized before. Here is my other thought. >> > Based on your example and the proposed method, we will get a predicted >> > rating of 5 in such case after normalization. This seems >> counter-intuitive >> > to me, since we know that these two items are very dissimilar (actually >> > opposite correlated), a predicted rating close to 1 will be more >> intuitive >> > to me. Maybe we need to think more about the expression in section 3.2.1 >> of >> > that paper. >> >
