My guess is that it is due to the fact that the Pearson correlation is
undefined for two series of just one data point -- that is, two items
or users that overlap in only one preference will have no similarity.
Very 'isolated' items have no similarity to anything else and no
estimate can be produced.

The Euclidean distance-based metric doesn't have this property.



On Tue, Mar 2, 2010 at 9:25 PM, Tamas Jambor <[email protected]> wrote:
> hi Sean,
>
> I have just tried out the EuclideanDistanceSimilarity method to calculate
> user similarity, but there is something strange it don't understand. I use
> 200 as the neighbourhood size, and within this neighbourhood I get a
> prediction for around 75% of my test items using Pearson correlation, but
> with this new one, I get almost 95% covered for the same dataset. Just
> wondering why, because I would expect the same proportion, since the way
> that the algorithm calculates prediction did not change.
>
> thanks,
> Tamas
>

Reply via email to