Re: EuclideanDistanceSimilarity

Sean Owen Wed, 03 Mar 2010 00:38:28 -0800

That's right. I used to have separate implementations.

This might be a good question to ask the experts: while based on my
understanding of the issues, it seems a bit better to compute the
cosine measure based on centered (mean = 0) data, I wonder if there
are good arguments for not doing this?

Instead of computing the cosine of the angle between user vectors A
and B, it would be computing the cosine of the angle between A+m_A and
B+m_B, where m_A and m_B are vectors whose entries are just the
average of elements in A and B respectively. It pushes the A and B
endpoints out in the direction of (1,1,...1) or (-1,-1,...,-1).

This narrows the angle, and makes similarities tends towards 1. In
fact for the case where ratings are in, say, [0,10], the angle never
gets past 90 degrees, so the similarity is in [0,1] and not even
[-1,1]. It feels like it's losing a bit of dynamic range, but that's
got to be a minor issue.

That is the greater the average preference, the less difference in
preference will matter to the similarity measure. The only aspect of
this I don't like is that differences in preferences at the small end
of the range matter much more, which doesn't seem intuitively right.
The similarity between users who rated two movies (0,1) and (1,0) is
as low as possible -- 0 -- while the similarity between users who
rated two movies (9,10) and (10,9) is nearly 1. But in both cases they
rated two movies quite similarly on a scale of 0 to 10. With
centering, the result would have been identical.

On Wed, Mar 3, 2010 at 12:19 AM, Tamas Jambor <[email protected]> wrote:
> sure, if you center the data then they are identical. but the uncentered
> cosine similarity is quite different, as far as I know.
>
> T
>
> On 02/03/2010 22:55, Sean Owen wrote:
>>
>> Yes, that's also the Pearson-correlation-based one, since it forces
>> the data to be 'centered' (mean of 0) during the computation. In that
>> case they are actually identical.
>>
>> On Tue, Mar 2, 2010 at 10:47 PM, Tamas Jambor<[email protected]>
>>  wrote:
>>
>>>
>>> Thanks. that makes sense. Which one would be cosine similarity? do you
>>> have
>>> that implemented?
>>>
>
>

Re: EuclideanDistanceSimilarity

Reply via email to