The cosine similarity can be negative, but yes 0 here also means no
relation. This isn't true of, say, a Euclidean distance-based measure.
in this case we only have positive ratings, so all the vectors stay in
that space, that means it can't
be negative.
evaluate. The first one is your approach, and the second one is the one I
mentioned in the previous email.
How are you dealing with negative/undefined predictions though? I am
also not sure what defining it this does to the accuracy of estimated
preference values, which is what the evaluator would test. This tends
to push estimates towards negative values.
I could believe it works a little better for your data set -- so you
should use your variation, especially if you only care about precision
and recall. I don't know that this is going to be better in general --
honestly don't know, I haven't studied this. Failing that, I am just
having a hard time implementing this as a ill-defined weighted
average, no matter what it seems to do to one data set.
it is also arguable if it is better to interpret negative correlation
towards to score, especially with Pearson.
maybe the best solution would be to dump all the negative values as not
meaningful. but I personally think
that leaving zero correlation as zero is quite important.
Tamas