While I agree that it's fairly meaningless mathematically, this ensures
that the distance between two vectors that are the same is 0 always holds.
Think of yourself using this class through the DistanceMeasure interface.
The implicit expectation [1] here is that d(x, y) = 0 iff x = y.

[1] http://en.wikipedia.org/wiki/Metric_(mathematics)


On Thu, Apr 4, 2013 at 11:40 PM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:

> I think it should return an "undefined" symbol.  There is no angle between
> two zero vectors.
>
> In a practical sense, taking two zero vectors to be equivalent in the
> context of user-item vectors, say, is dodgy in my opinion.  That is akin to
> saying "If we both hate everything on this restaurant's menu we are the
> same person."
>
>
> On Thu, Apr 4, 2013 at 11:56 AM, Dan Filimon <dangeorge.fili...@gmail.com
> >wrote:
>
> > Suneel is right. :)
> >
> > Let me explain how this came up:
> > - When clustering, and assigning a point to a cluster, the centroid needs
> > to be updated.
> > - To update the centroid in the nearest neighbor searcher classes, the
> > centroid must first be removed.
> > - To remove the centroid, we get the closest vector (search for it, and
> it
> > should be itself) and then remove it from the data structures.
> > => However, when the centroid is 0, the nearest vector (which should be
> > itself) has a huge distance (1 rather than 0) and this trips a check.
> >
> >
> > On Thu, Apr 4, 2013 at 9:46 PM, Sean Owen <sro...@gmail.com> wrote:
> >
> > > It sounds pretty undefined, but I would tend to define the distance as
> > > 0 in this case of course. And that means defining the cosine as 1.
> > > Which class in particular? There are a few implementations of this
> > > distance measure.
> > >
> > > On Thu, Apr 4, 2013 at 7:42 PM, Dan Filimon <
> dangeorge.fili...@gmail.com
> > >
> > > wrote:
> > > > In the case where both vectors are all zeros, the angle between them
> is
> > > 0,
> > > > so the cosine is therefore 1 and the so the distance returned should
> > be 0
> > > > (unless I misunderstood what the distance does).
> > > >
> > > > In Mahout, when calling distance() however, if both the denominator
> and
> > > > dotProduct are 0 (which is true when both vectors are 0), the
> returned
> > > > value is 1.
> > > >
> > > > This looks like a bug to me and I would open a JIRA issue and fix it
> > but
> > > I
> > > > want to make sure there's nothing I could possibly be missing.
> > > >
> > > > Thoughts?
> > >
> >
>

Reply via email to