[
https://issues.apache.org/jira/browse/MAHOUT-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084922#comment-14084922
]
Ted Dunning commented on MAHOUT-1602:
-------------------------------------
Is the comment wrong or is the code wrong?
Sounds like it may actually be the comment. I say this because
{code}
sqrt(sum (x_i - y_i)^2)
{code}
is the distance between vectors. The root-mean-square distance is
{code}
sqrt(1/n * sum (x_i - y_i)^2)
{code}
Converting a distance d into a similarity can plausibly be done using
{code}
1/(1+d)
{code}
or
{code}
1/(1+d^2)
{code}
or even
{code}
1/d
{code}
(if all vectors are distinct)
The idea is to use a form which has a maximum where distance is minimized.
With distances, you want to preserve qualities like the triangle inequality,
but not so much for similarities.
This looks to me more like your "incorrect" version and that makes it seem like
there might be fault with the comment.
What do you think?
> Euclidean Distance Similarity Math
> -----------------------------------
>
> Key: MAHOUT-1602
> URL: https://issues.apache.org/jira/browse/MAHOUT-1602
> Project: Mahout
> Issue Type: Bug
> Components: Collaborative Filtering, Math
> Reporter: Leonardo Fernandez Sanchez
>
> Within the file:
> /mrlegacy/src/main/java/org/apache/mahout/cf/taste/impl/similarity/EuclideanDistanceSimilarity.java
> Mentions that the implementation should be sqrt(n) / (1 + distance).
> Once the equation is simplified, should be:
> 1 / ((1 + distance) / sqrt(n))
> Coded:
> return 1.0 / ((1.0 + Math.sqrt(sumXYdiff2)) / Math.sqrt(n));
> But instead is (missing grouping brackets):
> 1 / (1 + distance / sqrt (n))
> Coded:
> return 1.0 / (1.0 + Math.sqrt(sumXYdiff2) / Math.sqrt(n));
--
This message was sent by Atlassian JIRA
(v6.2#6252)