On 04.04.2013 23:22, Dan Filimon wrote: > Ah, okay then. :) > I thought that you depend on the current convention that it returns 1. So, > disclaimers aside, you're fine with the change?
Yes, I concur that the distance between two identical vectors should be zero. > > > On Fri, Apr 5, 2013 at 12:20 AM, Sebastian Schelter <ssc.o...@googlemail.com >> wrote: > >> You can ignore the recommender stuff for the DistanceMeasure classes, as >> the recommenders use their own distance/similarity implementations. >> >> I justed wanted to comment on the example that Andrew gave, to mention >> that there are some common pitfalls with modeling ratings/interactions. >> >> On 04.04.2013 23:14, Dan Filimon wrote: >>> Right, that's fair. So, you're saying there needs to be a special value >>> when both vectors are 0 for the recommender system to work? >>> And that 0 means dislike, which isn't in fact accurate. You want to >> convey >>> lack of information. >>> >>> But now, the code returns 1. Is that a special value? I'd guess it means >>> you like it by default...? >>> >>> >>> On Fri, Apr 5, 2013 at 12:11 AM, Sebastian Schelter < >> ssc.o...@googlemail.com >>>> wrote: >>> >>>> In recommender systems, it's dangerous to interpret "no interaction" as >>>> dislike. Think of all movies you never watched, do you really dislike >>>> them all? :) >>>> >>>> >>>> On 04.04.2013 23:03, Andrew Musselman wrote: >>>>> I agree; I mis-spoke before if I said "dislike". Zero to me means >>>>> literally nothing. No interaction. Which could be either "don't >> like", >>>>> "don't like today", "dislike", etc. Which adds to the meaninglessness >> of >>>>> it. >>>>> >>>>> >>>>> On Thu, Apr 4, 2013 at 2:00 PM, Sebastian Schelter >>>>> <ssc.o...@googlemail.com>wrote: >>>>> >>>>>> I think that in our recommender code, 0 should mean no rating or no >>>>>> interaction observed. I think modeling dislike with 0 creates lot of >>>>>> unnecessary problems. >>>>>> >>>>>> On 04.04.2013 22:56, Andrew Musselman wrote: >>>>>>> I see the arguments for having it defined, just raising the point >> that >>>>>> it's >>>>>>> a very strange spot to be in. >>>>>>> >>>>>>> If all users are zero except for one person who likes the lentil >> soup, >>>>>> then >>>>>>> the other users are equally different from that person. >>>>>>> >>>>>>> The problem for me is the discontinuity Sean mentions, where at zero >>>> you >>>>>> go >>>>>>> off a cliff and have no sense of distance. >>>>>>> >>>>>>> But for convenience and "behaving nicely" I'm fine with distance >>>> between >>>>>>> zero vectors being zero. >>>>>>> >>>>>>> >>>>>>> On Thu, Apr 4, 2013 at 1:50 PM, Dan Filimon < >>>> dangeorge.fili...@gmail.com >>>>>>> wrote: >>>>>>> >>>>>>>> While I agree that it's fairly meaningless mathematically, this >>>> ensures >>>>>>>> that the distance between two vectors that are the same is 0 always >>>>>> holds. >>>>>>>> Think of yourself using this class through the DistanceMeasure >>>>>> interface. >>>>>>>> The implicit expectation [1] here is that d(x, y) = 0 iff x = y. >>>>>>>> >>>>>>>> [1] http://en.wikipedia.org/wiki/Metric_(mathematics) >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Apr 4, 2013 at 11:40 PM, Andrew Musselman < >>>>>>>> andrew.mussel...@gmail.com> wrote: >>>>>>>> >>>>>>>>> I think it should return an "undefined" symbol. There is no angle >>>>>>>> between >>>>>>>>> two zero vectors. >>>>>>>>> >>>>>>>>> In a practical sense, taking two zero vectors to be equivalent in >> the >>>>>>>>> context of user-item vectors, say, is dodgy in my opinion. That is >>>>>> akin >>>>>>>> to >>>>>>>>> saying "If we both hate everything on this restaurant's menu we are >>>> the >>>>>>>>> same person." >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Apr 4, 2013 at 11:56 AM, Dan Filimon < >>>>>>>> dangeorge.fili...@gmail.com >>>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Suneel is right. :) >>>>>>>>>> >>>>>>>>>> Let me explain how this came up: >>>>>>>>>> - When clustering, and assigning a point to a cluster, the >> centroid >>>>>>>> needs >>>>>>>>>> to be updated. >>>>>>>>>> - To update the centroid in the nearest neighbor searcher classes, >>>> the >>>>>>>>>> centroid must first be removed. >>>>>>>>>> - To remove the centroid, we get the closest vector (search for >> it, >>>>>> and >>>>>>>>> it >>>>>>>>>> should be itself) and then remove it from the data structures. >>>>>>>>>> => However, when the centroid is 0, the nearest vector (which >> should >>>>>> be >>>>>>>>>> itself) has a huge distance (1 rather than 0) and this trips a >>>> check. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Apr 4, 2013 at 9:46 PM, Sean Owen <sro...@gmail.com> >> wrote: >>>>>>>>>> >>>>>>>>>>> It sounds pretty undefined, but I would tend to define the >> distance >>>>>>>> as >>>>>>>>>>> 0 in this case of course. And that means defining the cosine as >> 1. >>>>>>>>>>> Which class in particular? There are a few implementations of >> this >>>>>>>>>>> distance measure. >>>>>>>>>>> >>>>>>>>>>> On Thu, Apr 4, 2013 at 7:42 PM, Dan Filimon < >>>>>>>>> dangeorge.fili...@gmail.com >>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>> In the case where both vectors are all zeros, the angle between >>>>>>>> them >>>>>>>>> is >>>>>>>>>>> 0, >>>>>>>>>>>> so the cosine is therefore 1 and the so the distance returned >>>>>>>> should >>>>>>>>>> be 0 >>>>>>>>>>>> (unless I misunderstood what the distance does). >>>>>>>>>>>> >>>>>>>>>>>> In Mahout, when calling distance() however, if both the >>>> denominator >>>>>>>>> and >>>>>>>>>>>> dotProduct are 0 (which is true when both vectors are 0), the >>>>>>>>> returned >>>>>>>>>>>> value is 1. >>>>>>>>>>>> >>>>>>>>>>>> This looks like a bug to me and I would open a JIRA issue and >> fix >>>>>>>> it >>>>>>>>>> but >>>>>>>>>>> I >>>>>>>>>>>> want to make sure there's nothing I could possibly be missing. >>>>>>>>>>>> >>>>>>>>>>>> Thoughts? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> >