It is a good argument, and cosine distance is discontinuous at 0. In the context here they're trying to define a distance metric rather than actually care about the angle in question, and 0 is probably a better way to define it than anything else. I think it's OK to say that two users for whom you have no info are equivalent for all intents and purposes.
On Thu, Apr 4, 2013 at 9:40 PM, Andrew Musselman <[email protected]> wrote: > I think it should return an "undefined" symbol. There is no angle between > two zero vectors. > > In a practical sense, taking two zero vectors to be equivalent in the > context of user-item vectors, say, is dodgy in my opinion. That is akin to > saying "If we both hate everything on this restaurant's menu we are the > same person." > > > On Thu, Apr 4, 2013 at 11:56 AM, Dan Filimon > <[email protected]>wrote: > >> Suneel is right. :) >> >> Let me explain how this came up: >> - When clustering, and assigning a point to a cluster, the centroid needs >> to be updated. >> - To update the centroid in the nearest neighbor searcher classes, the >> centroid must first be removed. >> - To remove the centroid, we get the closest vector (search for it, and it >> should be itself) and then remove it from the data structures. >> => However, when the centroid is 0, the nearest vector (which should be >> itself) has a huge distance (1 rather than 0) and this trips a check. >> >> >> On Thu, Apr 4, 2013 at 9:46 PM, Sean Owen <[email protected]> wrote: >> >> > It sounds pretty undefined, but I would tend to define the distance as >> > 0 in this case of course. And that means defining the cosine as 1. >> > Which class in particular? There are a few implementations of this >> > distance measure. >> > >> > On Thu, Apr 4, 2013 at 7:42 PM, Dan Filimon <[email protected] >> > >> > wrote: >> > > In the case where both vectors are all zeros, the angle between them is >> > 0, >> > > so the cosine is therefore 1 and the so the distance returned should >> be 0 >> > > (unless I misunderstood what the distance does). >> > > >> > > In Mahout, when calling distance() however, if both the denominator and >> > > dotProduct are 0 (which is true when both vectors are 0), the returned >> > > value is 1. >> > > >> > > This looks like a bug to me and I would open a JIRA issue and fix it >> but >> > I >> > > want to make sure there's nothing I could possibly be missing. >> > > >> > > Thoughts? >> > >>
