You can ignore the recommender stuff for the DistanceMeasure classes, as
the recommenders use their own distance/similarity implementations.

I justed wanted to comment on the example that Andrew gave, to mention
that there are some common pitfalls with modeling ratings/interactions.

On 04.04.2013 23:14, Dan Filimon wrote:
> Right, that's fair. So, you're saying there needs to be a special value
> when both vectors are 0 for the recommender system to work?
> And that 0 means dislike, which isn't in fact accurate. You want to convey
> lack of information.
> 
> But now, the code returns 1. Is that a special value? I'd guess it means
> you like it by default...?
> 
> 
> On Fri, Apr 5, 2013 at 12:11 AM, Sebastian Schelter <[email protected]
>> wrote:
> 
>> In recommender systems, it's dangerous to interpret "no interaction" as
>> dislike. Think of all movies you never watched, do you really dislike
>> them all? :)
>>
>>
>> On 04.04.2013 23:03, Andrew Musselman wrote:
>>> I agree; I mis-spoke before if I said "dislike".  Zero to me means
>>> literally nothing.  No interaction.  Which could be either "don't like",
>>> "don't like today", "dislike", etc.  Which adds to the meaninglessness of
>>> it.
>>>
>>>
>>> On Thu, Apr 4, 2013 at 2:00 PM, Sebastian Schelter
>>> <[email protected]>wrote:
>>>
>>>> I think that in our recommender code, 0 should mean no rating or no
>>>> interaction observed. I think modeling dislike with 0 creates lot of
>>>> unnecessary problems.
>>>>
>>>> On 04.04.2013 22:56, Andrew Musselman wrote:
>>>>> I see the arguments for having it defined, just raising the point that
>>>> it's
>>>>> a very strange spot to be in.
>>>>>
>>>>> If all users are zero except for one person who likes the lentil soup,
>>>> then
>>>>> the other users are equally different from that person.
>>>>>
>>>>> The problem for me is the discontinuity Sean mentions, where at zero
>> you
>>>> go
>>>>> off a cliff and have no sense of distance.
>>>>>
>>>>> But for convenience and "behaving nicely" I'm fine with distance
>> between
>>>>> zero vectors being zero.
>>>>>
>>>>>
>>>>> On Thu, Apr 4, 2013 at 1:50 PM, Dan Filimon <
>> [email protected]
>>>>> wrote:
>>>>>
>>>>>> While I agree that it's fairly meaningless mathematically, this
>> ensures
>>>>>> that the distance between two vectors that are the same is 0 always
>>>> holds.
>>>>>> Think of yourself using this class through the DistanceMeasure
>>>> interface.
>>>>>> The implicit expectation [1] here is that d(x, y) = 0 iff x = y.
>>>>>>
>>>>>> [1] http://en.wikipedia.org/wiki/Metric_(mathematics)
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 4, 2013 at 11:40 PM, Andrew Musselman <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> I think it should return an "undefined" symbol.  There is no angle
>>>>>> between
>>>>>>> two zero vectors.
>>>>>>>
>>>>>>> In a practical sense, taking two zero vectors to be equivalent in the
>>>>>>> context of user-item vectors, say, is dodgy in my opinion.  That is
>>>> akin
>>>>>> to
>>>>>>> saying "If we both hate everything on this restaurant's menu we are
>> the
>>>>>>> same person."
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Apr 4, 2013 at 11:56 AM, Dan Filimon <
>>>>>> [email protected]
>>>>>>>> wrote:
>>>>>>>
>>>>>>>> Suneel is right. :)
>>>>>>>>
>>>>>>>> Let me explain how this came up:
>>>>>>>> - When clustering, and assigning a point to a cluster, the centroid
>>>>>> needs
>>>>>>>> to be updated.
>>>>>>>> - To update the centroid in the nearest neighbor searcher classes,
>> the
>>>>>>>> centroid must first be removed.
>>>>>>>> - To remove the centroid, we get the closest vector (search for it,
>>>> and
>>>>>>> it
>>>>>>>> should be itself) and then remove it from the data structures.
>>>>>>>> => However, when the centroid is 0, the nearest vector (which should
>>>> be
>>>>>>>> itself) has a huge distance (1 rather than 0) and this trips a
>> check.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Apr 4, 2013 at 9:46 PM, Sean Owen <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> It sounds pretty undefined, but I would tend to define the distance
>>>>>> as
>>>>>>>>> 0 in this case of course. And that means defining the cosine as 1.
>>>>>>>>> Which class in particular? There are a few implementations of this
>>>>>>>>> distance measure.
>>>>>>>>>
>>>>>>>>> On Thu, Apr 4, 2013 at 7:42 PM, Dan Filimon <
>>>>>>> [email protected]
>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>> In the case where both vectors are all zeros, the angle between
>>>>>> them
>>>>>>> is
>>>>>>>>> 0,
>>>>>>>>>> so the cosine is therefore 1 and the so the distance returned
>>>>>> should
>>>>>>>> be 0
>>>>>>>>>> (unless I misunderstood what the distance does).
>>>>>>>>>>
>>>>>>>>>> In Mahout, when calling distance() however, if both the
>> denominator
>>>>>>> and
>>>>>>>>>> dotProduct are 0 (which is true when both vectors are 0), the
>>>>>>> returned
>>>>>>>>>> value is 1.
>>>>>>>>>>
>>>>>>>>>> This looks like a bug to me and I would open a JIRA issue and fix
>>>>>> it
>>>>>>>> but
>>>>>>>>> I
>>>>>>>>>> want to make sure there's nothing I could possibly be missing.
>>>>>>>>>>
>>>>>>>>>> Thoughts?
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
> 

Reply via email to