In recommender systems, it's dangerous to interpret "no interaction" as dislike. Think of all movies you never watched, do you really dislike them all? :)
On 04.04.2013 23:03, Andrew Musselman wrote: > I agree; I mis-spoke before if I said "dislike". Zero to me means > literally nothing. No interaction. Which could be either "don't like", > "don't like today", "dislike", etc. Which adds to the meaninglessness of > it. > > > On Thu, Apr 4, 2013 at 2:00 PM, Sebastian Schelter > <ssc.o...@googlemail.com>wrote: > >> I think that in our recommender code, 0 should mean no rating or no >> interaction observed. I think modeling dislike with 0 creates lot of >> unnecessary problems. >> >> On 04.04.2013 22:56, Andrew Musselman wrote: >>> I see the arguments for having it defined, just raising the point that >> it's >>> a very strange spot to be in. >>> >>> If all users are zero except for one person who likes the lentil soup, >> then >>> the other users are equally different from that person. >>> >>> The problem for me is the discontinuity Sean mentions, where at zero you >> go >>> off a cliff and have no sense of distance. >>> >>> But for convenience and "behaving nicely" I'm fine with distance between >>> zero vectors being zero. >>> >>> >>> On Thu, Apr 4, 2013 at 1:50 PM, Dan Filimon <dangeorge.fili...@gmail.com >>> wrote: >>> >>>> While I agree that it's fairly meaningless mathematically, this ensures >>>> that the distance between two vectors that are the same is 0 always >> holds. >>>> Think of yourself using this class through the DistanceMeasure >> interface. >>>> The implicit expectation [1] here is that d(x, y) = 0 iff x = y. >>>> >>>> [1] http://en.wikipedia.org/wiki/Metric_(mathematics) >>>> >>>> >>>> On Thu, Apr 4, 2013 at 11:40 PM, Andrew Musselman < >>>> andrew.mussel...@gmail.com> wrote: >>>> >>>>> I think it should return an "undefined" symbol. There is no angle >>>> between >>>>> two zero vectors. >>>>> >>>>> In a practical sense, taking two zero vectors to be equivalent in the >>>>> context of user-item vectors, say, is dodgy in my opinion. That is >> akin >>>> to >>>>> saying "If we both hate everything on this restaurant's menu we are the >>>>> same person." >>>>> >>>>> >>>>> On Thu, Apr 4, 2013 at 11:56 AM, Dan Filimon < >>>> dangeorge.fili...@gmail.com >>>>>> wrote: >>>>> >>>>>> Suneel is right. :) >>>>>> >>>>>> Let me explain how this came up: >>>>>> - When clustering, and assigning a point to a cluster, the centroid >>>> needs >>>>>> to be updated. >>>>>> - To update the centroid in the nearest neighbor searcher classes, the >>>>>> centroid must first be removed. >>>>>> - To remove the centroid, we get the closest vector (search for it, >> and >>>>> it >>>>>> should be itself) and then remove it from the data structures. >>>>>> => However, when the centroid is 0, the nearest vector (which should >> be >>>>>> itself) has a huge distance (1 rather than 0) and this trips a check. >>>>>> >>>>>> >>>>>> On Thu, Apr 4, 2013 at 9:46 PM, Sean Owen <sro...@gmail.com> wrote: >>>>>> >>>>>>> It sounds pretty undefined, but I would tend to define the distance >>>> as >>>>>>> 0 in this case of course. And that means defining the cosine as 1. >>>>>>> Which class in particular? There are a few implementations of this >>>>>>> distance measure. >>>>>>> >>>>>>> On Thu, Apr 4, 2013 at 7:42 PM, Dan Filimon < >>>>> dangeorge.fili...@gmail.com >>>>>>> >>>>>>> wrote: >>>>>>>> In the case where both vectors are all zeros, the angle between >>>> them >>>>> is >>>>>>> 0, >>>>>>>> so the cosine is therefore 1 and the so the distance returned >>>> should >>>>>> be 0 >>>>>>>> (unless I misunderstood what the distance does). >>>>>>>> >>>>>>>> In Mahout, when calling distance() however, if both the denominator >>>>> and >>>>>>>> dotProduct are 0 (which is true when both vectors are 0), the >>>>> returned >>>>>>>> value is 1. >>>>>>>> >>>>>>>> This looks like a bug to me and I would open a JIRA issue and fix >>>> it >>>>>> but >>>>>>> I >>>>>>>> want to make sure there's nothing I could possibly be missing. >>>>>>>> >>>>>>>> Thoughts? >>>>>>> >>>>>> >>>>> >>>> >>> >> >> >