While I agree that it's fairly meaningless mathematically, this ensures that the distance between two vectors that are the same is 0 always holds. Think of yourself using this class through the DistanceMeasure interface. The implicit expectation [1] here is that d(x, y) = 0 iff x = y.
[1] http://en.wikipedia.org/wiki/Metric_(mathematics) On Thu, Apr 4, 2013 at 11:40 PM, Andrew Musselman < andrew.mussel...@gmail.com> wrote: > I think it should return an "undefined" symbol. There is no angle between > two zero vectors. > > In a practical sense, taking two zero vectors to be equivalent in the > context of user-item vectors, say, is dodgy in my opinion. That is akin to > saying "If we both hate everything on this restaurant's menu we are the > same person." > > > On Thu, Apr 4, 2013 at 11:56 AM, Dan Filimon <dangeorge.fili...@gmail.com > >wrote: > > > Suneel is right. :) > > > > Let me explain how this came up: > > - When clustering, and assigning a point to a cluster, the centroid needs > > to be updated. > > - To update the centroid in the nearest neighbor searcher classes, the > > centroid must first be removed. > > - To remove the centroid, we get the closest vector (search for it, and > it > > should be itself) and then remove it from the data structures. > > => However, when the centroid is 0, the nearest vector (which should be > > itself) has a huge distance (1 rather than 0) and this trips a check. > > > > > > On Thu, Apr 4, 2013 at 9:46 PM, Sean Owen <sro...@gmail.com> wrote: > > > > > It sounds pretty undefined, but I would tend to define the distance as > > > 0 in this case of course. And that means defining the cosine as 1. > > > Which class in particular? There are a few implementations of this > > > distance measure. > > > > > > On Thu, Apr 4, 2013 at 7:42 PM, Dan Filimon < > dangeorge.fili...@gmail.com > > > > > > wrote: > > > > In the case where both vectors are all zeros, the angle between them > is > > > 0, > > > > so the cosine is therefore 1 and the so the distance returned should > > be 0 > > > > (unless I misunderstood what the distance does). > > > > > > > > In Mahout, when calling distance() however, if both the denominator > and > > > > dotProduct are 0 (which is true when both vectors are 0), the > returned > > > > value is 1. > > > > > > > > This looks like a bug to me and I would open a JIRA issue and fix it > > but > > > I > > > > want to make sure there's nothing I could possibly be missing. > > > > > > > > Thoughts? > > > > > >