I think that in our recommender code, 0 should mean no rating or no interaction observed. I think modeling dislike with 0 creates lot of unnecessary problems.
On 04.04.2013 22:56, Andrew Musselman wrote: > I see the arguments for having it defined, just raising the point that it's > a very strange spot to be in. > > If all users are zero except for one person who likes the lentil soup, then > the other users are equally different from that person. > > The problem for me is the discontinuity Sean mentions, where at zero you go > off a cliff and have no sense of distance. > > But for convenience and "behaving nicely" I'm fine with distance between > zero vectors being zero. > > > On Thu, Apr 4, 2013 at 1:50 PM, Dan Filimon > <dangeorge.fili...@gmail.com>wrote: > >> While I agree that it's fairly meaningless mathematically, this ensures >> that the distance between two vectors that are the same is 0 always holds. >> Think of yourself using this class through the DistanceMeasure interface. >> The implicit expectation [1] here is that d(x, y) = 0 iff x = y. >> >> [1] http://en.wikipedia.org/wiki/Metric_(mathematics) >> >> >> On Thu, Apr 4, 2013 at 11:40 PM, Andrew Musselman < >> andrew.mussel...@gmail.com> wrote: >> >>> I think it should return an "undefined" symbol. There is no angle >> between >>> two zero vectors. >>> >>> In a practical sense, taking two zero vectors to be equivalent in the >>> context of user-item vectors, say, is dodgy in my opinion. That is akin >> to >>> saying "If we both hate everything on this restaurant's menu we are the >>> same person." >>> >>> >>> On Thu, Apr 4, 2013 at 11:56 AM, Dan Filimon < >> dangeorge.fili...@gmail.com >>>> wrote: >>> >>>> Suneel is right. :) >>>> >>>> Let me explain how this came up: >>>> - When clustering, and assigning a point to a cluster, the centroid >> needs >>>> to be updated. >>>> - To update the centroid in the nearest neighbor searcher classes, the >>>> centroid must first be removed. >>>> - To remove the centroid, we get the closest vector (search for it, and >>> it >>>> should be itself) and then remove it from the data structures. >>>> => However, when the centroid is 0, the nearest vector (which should be >>>> itself) has a huge distance (1 rather than 0) and this trips a check. >>>> >>>> >>>> On Thu, Apr 4, 2013 at 9:46 PM, Sean Owen <sro...@gmail.com> wrote: >>>> >>>>> It sounds pretty undefined, but I would tend to define the distance >> as >>>>> 0 in this case of course. And that means defining the cosine as 1. >>>>> Which class in particular? There are a few implementations of this >>>>> distance measure. >>>>> >>>>> On Thu, Apr 4, 2013 at 7:42 PM, Dan Filimon < >>> dangeorge.fili...@gmail.com >>>>> >>>>> wrote: >>>>>> In the case where both vectors are all zeros, the angle between >> them >>> is >>>>> 0, >>>>>> so the cosine is therefore 1 and the so the distance returned >> should >>>> be 0 >>>>>> (unless I misunderstood what the distance does). >>>>>> >>>>>> In Mahout, when calling distance() however, if both the denominator >>> and >>>>>> dotProduct are 0 (which is true when both vectors are 0), the >>> returned >>>>>> value is 1. >>>>>> >>>>>> This looks like a bug to me and I would open a JIRA issue and fix >> it >>>> but >>>>> I >>>>>> want to make sure there's nothing I could possibly be missing. >>>>>> >>>>>> Thoughts? >>>>> >>>> >>> >> >