What if the weights are 1,1,-1,-1? The estimate is -2 then. This is
why I say this won't work.
you could trim the result so you would get 1, which is the same as what
you get with your approach.
While in general I could ask why 2 is necessarily the "wrong" answer
and 1 is "right" -- in the case Pearson I agree that 1 is the right
answer. This isn't necessarily true for other similarity measures,
where 0 doesn't have to mean "no mutual information".
if you take cosine similarity, 0 mean that vectors are independent, that
is also an implication that
there is no mutual information. although cosine cannot get negative in
this case.
In the world of users, I would argue that a similarity of 0, even when
it is a 0 from a Pearson correlation, means there is *some*
relationship between the two users -- they overlap in some items out
the very many out there, which is a positive association. So,
factoring in uncorrelated users is, I would say, more valid than
ignoring them. That's one reason I actually like the effect of the
"+1" over "+0".
it is true that in my case this modification doesn't have that
distinctive effect on the probability values
using user-based recommender.
I think this is less true for items, as you say, since in many cases
(like yours I think) there are more users than items. It is more
likely to be able to compute some similarity between items; the
existence of any similarity at all means less. The "+1" could distort
more than "+0" -- but again I am not sure what else to do as "+0"
leads to ill-defined results.
I agree that in this case you take into account information that is not
relevant, and that distorts the
result. actually I just ran a quick evaluation comparing the two
approaches. I used some IR measures to
evaluate. The first one is your approach, and the second one is the one
I mentioned in the previous email.
(movielens 1m dataset, item-based, pearson)
n...@10 - 0.7285387375322516, 0.7634082972451305
n...@5 - 0.6967904224170633, 0.7549089634943423
precis...@10 - 0.6995672012037591, 0.7168691567887784
precis...@5 - 0.6973511214230481, 0.7418986852281506
MRR - 0.7998089019219268, 0.8685472365056628
(movielens 1m dataset, user-based (neighbourhood size - 200), pearson)
n...@10 - 0.7646000398212323, 0.7644630753602327
n...@5 - 0.7423746185882404, 0.7420668850370802
precis...@10 - 0.7225191341799975, 0.7224362566729521
precis...@5 - 0.7364440024310716, 0.7360793414000781
MRR - 0.8422149426811263, 0.8421165522048173