It is a good argument, and cosine distance is discontinuous at 0. In
the context here they're trying to define a distance metric rather
than actually care about the angle in question, and 0 is probably a
better way to define it than anything else. I think it's OK to say
that two users for whom you have no info are equivalent for all
intents and purposes.

On Thu, Apr 4, 2013 at 9:40 PM, Andrew Musselman
<[email protected]> wrote:
> I think it should return an "undefined" symbol.  There is no angle between
> two zero vectors.
>
> In a practical sense, taking two zero vectors to be equivalent in the
> context of user-item vectors, say, is dodgy in my opinion.  That is akin to
> saying "If we both hate everything on this restaurant's menu we are the
> same person."
>
>
> On Thu, Apr 4, 2013 at 11:56 AM, Dan Filimon 
> <[email protected]>wrote:
>
>> Suneel is right. :)
>>
>> Let me explain how this came up:
>> - When clustering, and assigning a point to a cluster, the centroid needs
>> to be updated.
>> - To update the centroid in the nearest neighbor searcher classes, the
>> centroid must first be removed.
>> - To remove the centroid, we get the closest vector (search for it, and it
>> should be itself) and then remove it from the data structures.
>> => However, when the centroid is 0, the nearest vector (which should be
>> itself) has a huge distance (1 rather than 0) and this trips a check.
>>
>>
>> On Thu, Apr 4, 2013 at 9:46 PM, Sean Owen <[email protected]> wrote:
>>
>> > It sounds pretty undefined, but I would tend to define the distance as
>> > 0 in this case of course. And that means defining the cosine as 1.
>> > Which class in particular? There are a few implementations of this
>> > distance measure.
>> >
>> > On Thu, Apr 4, 2013 at 7:42 PM, Dan Filimon <[email protected]
>> >
>> > wrote:
>> > > In the case where both vectors are all zeros, the angle between them is
>> > 0,
>> > > so the cosine is therefore 1 and the so the distance returned should
>> be 0
>> > > (unless I misunderstood what the distance does).
>> > >
>> > > In Mahout, when calling distance() however, if both the denominator and
>> > > dotProduct are 0 (which is true when both vectors are 0), the returned
>> > > value is 1.
>> > >
>> > > This looks like a bug to me and I would open a JIRA issue and fix it
>> but
>> > I
>> > > want to make sure there's nothing I could possibly be missing.
>> > >
>> > > Thoughts?
>> >
>>

Reply via email to