On 04.04.2013 23:22, Dan Filimon wrote:
> Ah, okay then. :)
> I thought that you depend on the current convention that it returns 1. So,
> disclaimers aside, you're fine with the change?

Yes, I concur that the distance between two identical vectors should be
zero.

> 
> 
> On Fri, Apr 5, 2013 at 12:20 AM, Sebastian Schelter <ssc.o...@googlemail.com
>> wrote:
> 
>> You can ignore the recommender stuff for the DistanceMeasure classes, as
>> the recommenders use their own distance/similarity implementations.
>>
>> I justed wanted to comment on the example that Andrew gave, to mention
>> that there are some common pitfalls with modeling ratings/interactions.
>>
>> On 04.04.2013 23:14, Dan Filimon wrote:
>>> Right, that's fair. So, you're saying there needs to be a special value
>>> when both vectors are 0 for the recommender system to work?
>>> And that 0 means dislike, which isn't in fact accurate. You want to
>> convey
>>> lack of information.
>>>
>>> But now, the code returns 1. Is that a special value? I'd guess it means
>>> you like it by default...?
>>>
>>>
>>> On Fri, Apr 5, 2013 at 12:11 AM, Sebastian Schelter <
>> ssc.o...@googlemail.com
>>>> wrote:
>>>
>>>> In recommender systems, it's dangerous to interpret "no interaction" as
>>>> dislike. Think of all movies you never watched, do you really dislike
>>>> them all? :)
>>>>
>>>>
>>>> On 04.04.2013 23:03, Andrew Musselman wrote:
>>>>> I agree; I mis-spoke before if I said "dislike".  Zero to me means
>>>>> literally nothing.  No interaction.  Which could be either "don't
>> like",
>>>>> "don't like today", "dislike", etc.  Which adds to the meaninglessness
>> of
>>>>> it.
>>>>>
>>>>>
>>>>> On Thu, Apr 4, 2013 at 2:00 PM, Sebastian Schelter
>>>>> <ssc.o...@googlemail.com>wrote:
>>>>>
>>>>>> I think that in our recommender code, 0 should mean no rating or no
>>>>>> interaction observed. I think modeling dislike with 0 creates lot of
>>>>>> unnecessary problems.
>>>>>>
>>>>>> On 04.04.2013 22:56, Andrew Musselman wrote:
>>>>>>> I see the arguments for having it defined, just raising the point
>> that
>>>>>> it's
>>>>>>> a very strange spot to be in.
>>>>>>>
>>>>>>> If all users are zero except for one person who likes the lentil
>> soup,
>>>>>> then
>>>>>>> the other users are equally different from that person.
>>>>>>>
>>>>>>> The problem for me is the discontinuity Sean mentions, where at zero
>>>> you
>>>>>> go
>>>>>>> off a cliff and have no sense of distance.
>>>>>>>
>>>>>>> But for convenience and "behaving nicely" I'm fine with distance
>>>> between
>>>>>>> zero vectors being zero.
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Apr 4, 2013 at 1:50 PM, Dan Filimon <
>>>> dangeorge.fili...@gmail.com
>>>>>>> wrote:
>>>>>>>
>>>>>>>> While I agree that it's fairly meaningless mathematically, this
>>>> ensures
>>>>>>>> that the distance between two vectors that are the same is 0 always
>>>>>> holds.
>>>>>>>> Think of yourself using this class through the DistanceMeasure
>>>>>> interface.
>>>>>>>> The implicit expectation [1] here is that d(x, y) = 0 iff x = y.
>>>>>>>>
>>>>>>>> [1] http://en.wikipedia.org/wiki/Metric_(mathematics)
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Apr 4, 2013 at 11:40 PM, Andrew Musselman <
>>>>>>>> andrew.mussel...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> I think it should return an "undefined" symbol.  There is no angle
>>>>>>>> between
>>>>>>>>> two zero vectors.
>>>>>>>>>
>>>>>>>>> In a practical sense, taking two zero vectors to be equivalent in
>> the
>>>>>>>>> context of user-item vectors, say, is dodgy in my opinion.  That is
>>>>>> akin
>>>>>>>> to
>>>>>>>>> saying "If we both hate everything on this restaurant's menu we are
>>>> the
>>>>>>>>> same person."
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Apr 4, 2013 at 11:56 AM, Dan Filimon <
>>>>>>>> dangeorge.fili...@gmail.com
>>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Suneel is right. :)
>>>>>>>>>>
>>>>>>>>>> Let me explain how this came up:
>>>>>>>>>> - When clustering, and assigning a point to a cluster, the
>> centroid
>>>>>>>> needs
>>>>>>>>>> to be updated.
>>>>>>>>>> - To update the centroid in the nearest neighbor searcher classes,
>>>> the
>>>>>>>>>> centroid must first be removed.
>>>>>>>>>> - To remove the centroid, we get the closest vector (search for
>> it,
>>>>>> and
>>>>>>>>> it
>>>>>>>>>> should be itself) and then remove it from the data structures.
>>>>>>>>>> => However, when the centroid is 0, the nearest vector (which
>> should
>>>>>> be
>>>>>>>>>> itself) has a huge distance (1 rather than 0) and this trips a
>>>> check.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Apr 4, 2013 at 9:46 PM, Sean Owen <sro...@gmail.com>
>> wrote:
>>>>>>>>>>
>>>>>>>>>>> It sounds pretty undefined, but I would tend to define the
>> distance
>>>>>>>> as
>>>>>>>>>>> 0 in this case of course. And that means defining the cosine as
>> 1.
>>>>>>>>>>> Which class in particular? There are a few implementations of
>> this
>>>>>>>>>>> distance measure.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Apr 4, 2013 at 7:42 PM, Dan Filimon <
>>>>>>>>> dangeorge.fili...@gmail.com
>>>>>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> In the case where both vectors are all zeros, the angle between
>>>>>>>> them
>>>>>>>>> is
>>>>>>>>>>> 0,
>>>>>>>>>>>> so the cosine is therefore 1 and the so the distance returned
>>>>>>>> should
>>>>>>>>>> be 0
>>>>>>>>>>>> (unless I misunderstood what the distance does).
>>>>>>>>>>>>
>>>>>>>>>>>> In Mahout, when calling distance() however, if both the
>>>> denominator
>>>>>>>>> and
>>>>>>>>>>>> dotProduct are 0 (which is true when both vectors are 0), the
>>>>>>>>> returned
>>>>>>>>>>>> value is 1.
>>>>>>>>>>>>
>>>>>>>>>>>> This looks like a bug to me and I would open a JIRA issue and
>> fix
>>>>>>>> it
>>>>>>>>>> but
>>>>>>>>>>> I
>>>>>>>>>>>> want to make sure there's nothing I could possibly be missing.
>>>>>>>>>>>>
>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
> 

Reply via email to