You can ignore the recommender stuff for the DistanceMeasure classes, as the recommenders use their own distance/similarity implementations.
I justed wanted to comment on the example that Andrew gave, to mention that there are some common pitfalls with modeling ratings/interactions. On 04.04.2013 23:14, Dan Filimon wrote: > Right, that's fair. So, you're saying there needs to be a special value > when both vectors are 0 for the recommender system to work? > And that 0 means dislike, which isn't in fact accurate. You want to convey > lack of information. > > But now, the code returns 1. Is that a special value? I'd guess it means > you like it by default...? > > > On Fri, Apr 5, 2013 at 12:11 AM, Sebastian Schelter <[email protected] >> wrote: > >> In recommender systems, it's dangerous to interpret "no interaction" as >> dislike. Think of all movies you never watched, do you really dislike >> them all? :) >> >> >> On 04.04.2013 23:03, Andrew Musselman wrote: >>> I agree; I mis-spoke before if I said "dislike". Zero to me means >>> literally nothing. No interaction. Which could be either "don't like", >>> "don't like today", "dislike", etc. Which adds to the meaninglessness of >>> it. >>> >>> >>> On Thu, Apr 4, 2013 at 2:00 PM, Sebastian Schelter >>> <[email protected]>wrote: >>> >>>> I think that in our recommender code, 0 should mean no rating or no >>>> interaction observed. I think modeling dislike with 0 creates lot of >>>> unnecessary problems. >>>> >>>> On 04.04.2013 22:56, Andrew Musselman wrote: >>>>> I see the arguments for having it defined, just raising the point that >>>> it's >>>>> a very strange spot to be in. >>>>> >>>>> If all users are zero except for one person who likes the lentil soup, >>>> then >>>>> the other users are equally different from that person. >>>>> >>>>> The problem for me is the discontinuity Sean mentions, where at zero >> you >>>> go >>>>> off a cliff and have no sense of distance. >>>>> >>>>> But for convenience and "behaving nicely" I'm fine with distance >> between >>>>> zero vectors being zero. >>>>> >>>>> >>>>> On Thu, Apr 4, 2013 at 1:50 PM, Dan Filimon < >> [email protected] >>>>> wrote: >>>>> >>>>>> While I agree that it's fairly meaningless mathematically, this >> ensures >>>>>> that the distance between two vectors that are the same is 0 always >>>> holds. >>>>>> Think of yourself using this class through the DistanceMeasure >>>> interface. >>>>>> The implicit expectation [1] here is that d(x, y) = 0 iff x = y. >>>>>> >>>>>> [1] http://en.wikipedia.org/wiki/Metric_(mathematics) >>>>>> >>>>>> >>>>>> On Thu, Apr 4, 2013 at 11:40 PM, Andrew Musselman < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> I think it should return an "undefined" symbol. There is no angle >>>>>> between >>>>>>> two zero vectors. >>>>>>> >>>>>>> In a practical sense, taking two zero vectors to be equivalent in the >>>>>>> context of user-item vectors, say, is dodgy in my opinion. That is >>>> akin >>>>>> to >>>>>>> saying "If we both hate everything on this restaurant's menu we are >> the >>>>>>> same person." >>>>>>> >>>>>>> >>>>>>> On Thu, Apr 4, 2013 at 11:56 AM, Dan Filimon < >>>>>> [email protected] >>>>>>>> wrote: >>>>>>> >>>>>>>> Suneel is right. :) >>>>>>>> >>>>>>>> Let me explain how this came up: >>>>>>>> - When clustering, and assigning a point to a cluster, the centroid >>>>>> needs >>>>>>>> to be updated. >>>>>>>> - To update the centroid in the nearest neighbor searcher classes, >> the >>>>>>>> centroid must first be removed. >>>>>>>> - To remove the centroid, we get the closest vector (search for it, >>>> and >>>>>>> it >>>>>>>> should be itself) and then remove it from the data structures. >>>>>>>> => However, when the centroid is 0, the nearest vector (which should >>>> be >>>>>>>> itself) has a huge distance (1 rather than 0) and this trips a >> check. >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Apr 4, 2013 at 9:46 PM, Sean Owen <[email protected]> wrote: >>>>>>>> >>>>>>>>> It sounds pretty undefined, but I would tend to define the distance >>>>>> as >>>>>>>>> 0 in this case of course. And that means defining the cosine as 1. >>>>>>>>> Which class in particular? There are a few implementations of this >>>>>>>>> distance measure. >>>>>>>>> >>>>>>>>> On Thu, Apr 4, 2013 at 7:42 PM, Dan Filimon < >>>>>>> [email protected] >>>>>>>>> >>>>>>>>> wrote: >>>>>>>>>> In the case where both vectors are all zeros, the angle between >>>>>> them >>>>>>> is >>>>>>>>> 0, >>>>>>>>>> so the cosine is therefore 1 and the so the distance returned >>>>>> should >>>>>>>> be 0 >>>>>>>>>> (unless I misunderstood what the distance does). >>>>>>>>>> >>>>>>>>>> In Mahout, when calling distance() however, if both the >> denominator >>>>>>> and >>>>>>>>>> dotProduct are 0 (which is true when both vectors are 0), the >>>>>>> returned >>>>>>>>>> value is 1. >>>>>>>>>> >>>>>>>>>> This looks like a bug to me and I would open a JIRA issue and fix >>>>>> it >>>>>>>> but >>>>>>>>> I >>>>>>>>>> want to make sure there's nothing I could possibly be missing. >>>>>>>>>> >>>>>>>>>> Thoughts? >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> >
