Re: weighted score

Tamas Jambor Mon, 22 Feb 2010 08:19:14 -0800

What if the weights are 1,1,-1,-1? The estimate is -2 then. This is
why I say this won't work.

you could trim the result so you would get 1, which is the same as whatyou get with your approach.

While in general I could ask why 2 is necessarily the "wrong" answer
and 1 is "right" -- in the case Pearson I agree that 1 is the right
answer. This isn't necessarily true for other similarity measures,
where 0 doesn't have to mean "no mutual information".

if you take cosine similarity, 0 mean that vectors are independent, thatis also an implication thatthere is no mutual information. although cosine cannot get negative inthis case.

In the world of users, I would argue that a similarity of 0, even when
it is a 0 from a Pearson correlation, means there is *some*
relationship between the two users -- they overlap in some items out
the very many out there, which is a positive association. So,
factoring in uncorrelated users is, I would say, more valid than
ignoring them. That's one reason I actually like the effect of the
"+1" over "+0".

it is true that in my case this modification doesn't have thatdistinctive effect on the probability values

using user-based recommender.

I think this is less true for items, as you say, since in many cases
(like yours I think) there are more users than items. It is more
likely to be able to compute some similarity between items; the
existence of any similarity at all means less. The "+1" could distort
more than "+0" -- but again I am not sure what else to do as "+0"
leads to ill-defined results.

I agree that in this case you take into account information that is notrelevant, and that distorts theresult. actually I just ran a quick evaluation comparing the twoapproaches. I used some IR measures toevaluate. The first one is your approach, and the second one is the oneI mentioned in the previous email.


(movielens 1m dataset, item-based, pearson)

n...@10 - 0.7285387375322516, 0.7634082972451305
n...@5 - 0.6967904224170633, 0.7549089634943423
precis...@10 - 0.6995672012037591, 0.7168691567887784
precis...@5 - 0.6973511214230481, 0.7418986852281506
MRR - 0.7998089019219268, 0.8685472365056628

(movielens 1m dataset, user-based (neighbourhood size - 200),  pearson)

n...@10 - 0.7646000398212323, 0.7644630753602327
n...@5 - 0.7423746185882404, 0.7420668850370802
precis...@10 - 0.7225191341799975, 0.7224362566729521
precis...@5 - 0.7364440024310716, 0.7360793414000781
MRR - 0.8422149426811263, 0.8421165522048173

Re: weighted score

Reply via email to