Re: EuclideanDistanceSimilarity

Sean Owen Wed, 03 Mar 2010 05:51:07 -0800

On Wed, Mar 3, 2010 at 1:45 PM, Tamas Jambor <[email protected]> wrote:
> my problem is that using Person correlation with adjusted weighed average
> prediction (i.e. not shifting the weights) results in a very bad RMSE,
> especially for item-based (but I get a surprisingly good results for other
> measures), so that surely something wrong with negative weights.


Yes, that alone is not sufficient -- you'd need to cap the estimated
preference value at the minimum / maximum possible values. Otherwise
the RMSE could be infinite, since the estimate can be arbitrarily
large or small.

My patch for issue 321 will address this, such that you can safely use
Pearson. Or you can hack it into your own customized version.

> however, just tried cosine similarity and EuclideanDistanceSimilarity for
> item-based, they are OK in terms of RMSE (around 1.03 for both), although
> EuclideanDistanceSimilarity performs a slightly better with other measures (
> e.g. NDCG, MAP, precision, etc).

Euclidean distance similarity works because its weights aren't
negative. Even if you aren't capping the estimates, you won't get such
out-of-bounds estimates.

(Uncentered) Cosine measure similarity works for the same reason,
assuming your preference value range is nonnegative -- the cosine is
never negative then.

The real solution isn't really to use uncentered data, but to fully
implement support for negative weights, which means capping.

Re: EuclideanDistanceSimilarity

Reply via email to