for me one good practical indication would be, as you mentioned, that we don't have to deal with negative similarity, which is still a problem for me.

my understanding of the question is that when you center the data, you can interpret person correlation as cosine similarity, but in fact that has nothing to do with cosine similarity in the sense of the original definition, since we transformed the vectors so their direction is different.

T

On 03/03/2010 08:37, Sean Owen wrote:
That's right. I used to have separate implementations.

This might be a good question to ask the experts: while based on my
understanding of the issues, it seems a bit better to compute the
cosine measure based on centered (mean = 0) data, I wonder if there
are good arguments for not doing this?

Instead of computing the cosine of the angle between user vectors A
and B, it would be computing the cosine of the angle between A+m_A and
B+m_B, where m_A and m_B are vectors whose entries are just the
average of elements in A and B respectively. It pushes the A and B
endpoints out in the direction of (1,1,...1) or (-1,-1,...,-1).

This narrows the angle, and makes similarities tends towards 1. In
fact for the case where ratings are in, say, [0,10], the angle never
gets past 90 degrees, so the similarity is in [0,1] and not even
[-1,1]. It feels like it's losing a bit of dynamic range, but that's
got to be a minor issue.

That is the greater the average preference, the less difference in
preference will matter to the similarity measure. The only aspect of
this I don't like is that differences in preferences at the small end
of the range matter much more, which doesn't seem intuitively right.
The similarity between users who rated two movies (0,1) and (1,0) is
as low as possible -- 0 -- while the similarity between users who
rated two movies (9,10) and (10,9) is nearly 1. But in both cases they
rated two movies quite similarly on a scale of 0 to 10. With
centering, the result would have been identical.



On Wed, Mar 3, 2010 at 12:19 AM, Tamas Jambor<[email protected]>  wrote:
sure, if you center the data then they are identical. but the uncentered
cosine similarity is quite different, as far as I know.

T

On 02/03/2010 22:55, Sean Owen wrote:
Yes, that's also the Pearson-correlation-based one, since it forces
the data to be 'centered' (mean of 0) during the computation. In that
case they are actually identical.

On Tue, Mar 2, 2010 at 10:47 PM, Tamas Jambor<[email protected]>
  wrote:

Thanks. that makes sense. Which one would be cosine similarity? do you
have
that implemented?



Reply via email to