I think that in our recommender code, 0 should mean no rating or no
interaction observed. I think modeling dislike with 0 creates lot of
unnecessary problems.

On 04.04.2013 22:56, Andrew Musselman wrote:
> I see the arguments for having it defined, just raising the point that it's
> a very strange spot to be in.
> 
> If all users are zero except for one person who likes the lentil soup, then
> the other users are equally different from that person.
> 
> The problem for me is the discontinuity Sean mentions, where at zero you go
> off a cliff and have no sense of distance.
> 
> But for convenience and "behaving nicely" I'm fine with distance between
> zero vectors being zero.
> 
> 
> On Thu, Apr 4, 2013 at 1:50 PM, Dan Filimon 
> <dangeorge.fili...@gmail.com>wrote:
> 
>> While I agree that it's fairly meaningless mathematically, this ensures
>> that the distance between two vectors that are the same is 0 always holds.
>> Think of yourself using this class through the DistanceMeasure interface.
>> The implicit expectation [1] here is that d(x, y) = 0 iff x = y.
>>
>> [1] http://en.wikipedia.org/wiki/Metric_(mathematics)
>>
>>
>> On Thu, Apr 4, 2013 at 11:40 PM, Andrew Musselman <
>> andrew.mussel...@gmail.com> wrote:
>>
>>> I think it should return an "undefined" symbol.  There is no angle
>> between
>>> two zero vectors.
>>>
>>> In a practical sense, taking two zero vectors to be equivalent in the
>>> context of user-item vectors, say, is dodgy in my opinion.  That is akin
>> to
>>> saying "If we both hate everything on this restaurant's menu we are the
>>> same person."
>>>
>>>
>>> On Thu, Apr 4, 2013 at 11:56 AM, Dan Filimon <
>> dangeorge.fili...@gmail.com
>>>> wrote:
>>>
>>>> Suneel is right. :)
>>>>
>>>> Let me explain how this came up:
>>>> - When clustering, and assigning a point to a cluster, the centroid
>> needs
>>>> to be updated.
>>>> - To update the centroid in the nearest neighbor searcher classes, the
>>>> centroid must first be removed.
>>>> - To remove the centroid, we get the closest vector (search for it, and
>>> it
>>>> should be itself) and then remove it from the data structures.
>>>> => However, when the centroid is 0, the nearest vector (which should be
>>>> itself) has a huge distance (1 rather than 0) and this trips a check.
>>>>
>>>>
>>>> On Thu, Apr 4, 2013 at 9:46 PM, Sean Owen <sro...@gmail.com> wrote:
>>>>
>>>>> It sounds pretty undefined, but I would tend to define the distance
>> as
>>>>> 0 in this case of course. And that means defining the cosine as 1.
>>>>> Which class in particular? There are a few implementations of this
>>>>> distance measure.
>>>>>
>>>>> On Thu, Apr 4, 2013 at 7:42 PM, Dan Filimon <
>>> dangeorge.fili...@gmail.com
>>>>>
>>>>> wrote:
>>>>>> In the case where both vectors are all zeros, the angle between
>> them
>>> is
>>>>> 0,
>>>>>> so the cosine is therefore 1 and the so the distance returned
>> should
>>>> be 0
>>>>>> (unless I misunderstood what the distance does).
>>>>>>
>>>>>> In Mahout, when calling distance() however, if both the denominator
>>> and
>>>>>> dotProduct are 0 (which is true when both vectors are 0), the
>>> returned
>>>>>> value is 1.
>>>>>>
>>>>>> This looks like a bug to me and I would open a JIRA issue and fix
>> it
>>>> but
>>>>> I
>>>>>> want to make sure there's nothing I could possibly be missing.
>>>>>>
>>>>>> Thoughts?
>>>>>
>>>>
>>>
>>
> 

Reply via email to