Thanks. that makes sense. Which one would be cosine similarity? do you have that implemented?

Tamas

On 02/03/2010 21:31, Sean Owen wrote:
My guess is that it is due to the fact that the Pearson correlation is
undefined for two series of just one data point -- that is, two items
or users that overlap in only one preference will have no similarity.
Very 'isolated' items have no similarity to anything else and no
estimate can be produced.

The Euclidean distance-based metric doesn't have this property.



On Tue, Mar 2, 2010 at 9:25 PM, Tamas Jambor<[email protected]>  wrote:
hi Sean,

I have just tried out the EuclideanDistanceSimilarity method to calculate
user similarity, but there is something strange it don't understand. I use
200 as the neighbourhood size, and within this neighbourhood I get a
prediction for around 75% of my test items using Pearson correlation, but
with this new one, I get almost 95% covered for the same dataset. Just
wondering why, because I would expect the same proportion, since the way
that the algorithm calculates prediction did not change.

thanks,
Tamas


Reply via email to