I agree 1 is wrong :)
On Thu, Apr 4, 2013 at 2:22 PM, Dan Filimon <dangeorge.fili...@gmail.com>wrote: > Ah, okay then. :) > I thought that you depend on the current convention that it returns 1. So, > disclaimers aside, you're fine with the change? > > > On Fri, Apr 5, 2013 at 12:20 AM, Sebastian Schelter < > ssc.o...@googlemail.com > > wrote: > > > You can ignore the recommender stuff for the DistanceMeasure classes, as > > the recommenders use their own distance/similarity implementations. > > > > I justed wanted to comment on the example that Andrew gave, to mention > > that there are some common pitfalls with modeling ratings/interactions. > > > > On 04.04.2013 23:14, Dan Filimon wrote: > > > Right, that's fair. So, you're saying there needs to be a special value > > > when both vectors are 0 for the recommender system to work? > > > And that 0 means dislike, which isn't in fact accurate. You want to > > convey > > > lack of information. > > > > > > But now, the code returns 1. Is that a special value? I'd guess it > means > > > you like it by default...? > > > > > > > > > On Fri, Apr 5, 2013 at 12:11 AM, Sebastian Schelter < > > ssc.o...@googlemail.com > > >> wrote: > > > > > >> In recommender systems, it's dangerous to interpret "no interaction" > as > > >> dislike. Think of all movies you never watched, do you really dislike > > >> them all? :) > > >> > > >> > > >> On 04.04.2013 23:03, Andrew Musselman wrote: > > >>> I agree; I mis-spoke before if I said "dislike". Zero to me means > > >>> literally nothing. No interaction. Which could be either "don't > > like", > > >>> "don't like today", "dislike", etc. Which adds to the > meaninglessness > > of > > >>> it. > > >>> > > >>> > > >>> On Thu, Apr 4, 2013 at 2:00 PM, Sebastian Schelter > > >>> <ssc.o...@googlemail.com>wrote: > > >>> > > >>>> I think that in our recommender code, 0 should mean no rating or no > > >>>> interaction observed. I think modeling dislike with 0 creates lot of > > >>>> unnecessary problems. > > >>>> > > >>>> On 04.04.2013 22:56, Andrew Musselman wrote: > > >>>>> I see the arguments for having it defined, just raising the point > > that > > >>>> it's > > >>>>> a very strange spot to be in. > > >>>>> > > >>>>> If all users are zero except for one person who likes the lentil > > soup, > > >>>> then > > >>>>> the other users are equally different from that person. > > >>>>> > > >>>>> The problem for me is the discontinuity Sean mentions, where at > zero > > >> you > > >>>> go > > >>>>> off a cliff and have no sense of distance. > > >>>>> > > >>>>> But for convenience and "behaving nicely" I'm fine with distance > > >> between > > >>>>> zero vectors being zero. > > >>>>> > > >>>>> > > >>>>> On Thu, Apr 4, 2013 at 1:50 PM, Dan Filimon < > > >> dangeorge.fili...@gmail.com > > >>>>> wrote: > > >>>>> > > >>>>>> While I agree that it's fairly meaningless mathematically, this > > >> ensures > > >>>>>> that the distance between two vectors that are the same is 0 > always > > >>>> holds. > > >>>>>> Think of yourself using this class through the DistanceMeasure > > >>>> interface. > > >>>>>> The implicit expectation [1] here is that d(x, y) = 0 iff x = y. > > >>>>>> > > >>>>>> [1] http://en.wikipedia.org/wiki/Metric_(mathematics) > > >>>>>> > > >>>>>> > > >>>>>> On Thu, Apr 4, 2013 at 11:40 PM, Andrew Musselman < > > >>>>>> andrew.mussel...@gmail.com> wrote: > > >>>>>> > > >>>>>>> I think it should return an "undefined" symbol. There is no > angle > > >>>>>> between > > >>>>>>> two zero vectors. > > >>>>>>> > > >>>>>>> In a practical sense, taking two zero vectors to be equivalent in > > the > > >>>>>>> context of user-item vectors, say, is dodgy in my opinion. That > is > > >>>> akin > > >>>>>> to > > >>>>>>> saying "If we both hate everything on this restaurant's menu we > are > > >> the > > >>>>>>> same person." > > >>>>>>> > > >>>>>>> > > >>>>>>> On Thu, Apr 4, 2013 at 11:56 AM, Dan Filimon < > > >>>>>> dangeorge.fili...@gmail.com > > >>>>>>>> wrote: > > >>>>>>> > > >>>>>>>> Suneel is right. :) > > >>>>>>>> > > >>>>>>>> Let me explain how this came up: > > >>>>>>>> - When clustering, and assigning a point to a cluster, the > > centroid > > >>>>>> needs > > >>>>>>>> to be updated. > > >>>>>>>> - To update the centroid in the nearest neighbor searcher > classes, > > >> the > > >>>>>>>> centroid must first be removed. > > >>>>>>>> - To remove the centroid, we get the closest vector (search for > > it, > > >>>> and > > >>>>>>> it > > >>>>>>>> should be itself) and then remove it from the data structures. > > >>>>>>>> => However, when the centroid is 0, the nearest vector (which > > should > > >>>> be > > >>>>>>>> itself) has a huge distance (1 rather than 0) and this trips a > > >> check. > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> On Thu, Apr 4, 2013 at 9:46 PM, Sean Owen <sro...@gmail.com> > > wrote: > > >>>>>>>> > > >>>>>>>>> It sounds pretty undefined, but I would tend to define the > > distance > > >>>>>> as > > >>>>>>>>> 0 in this case of course. And that means defining the cosine as > > 1. > > >>>>>>>>> Which class in particular? There are a few implementations of > > this > > >>>>>>>>> distance measure. > > >>>>>>>>> > > >>>>>>>>> On Thu, Apr 4, 2013 at 7:42 PM, Dan Filimon < > > >>>>>>> dangeorge.fili...@gmail.com > > >>>>>>>>> > > >>>>>>>>> wrote: > > >>>>>>>>>> In the case where both vectors are all zeros, the angle > between > > >>>>>> them > > >>>>>>> is > > >>>>>>>>> 0, > > >>>>>>>>>> so the cosine is therefore 1 and the so the distance returned > > >>>>>> should > > >>>>>>>> be 0 > > >>>>>>>>>> (unless I misunderstood what the distance does). > > >>>>>>>>>> > > >>>>>>>>>> In Mahout, when calling distance() however, if both the > > >> denominator > > >>>>>>> and > > >>>>>>>>>> dotProduct are 0 (which is true when both vectors are 0), the > > >>>>>>> returned > > >>>>>>>>>> value is 1. > > >>>>>>>>>> > > >>>>>>>>>> This looks like a bug to me and I would open a JIRA issue and > > fix > > >>>>>> it > > >>>>>>>> but > > >>>>>>>>> I > > >>>>>>>>>> want to make sure there's nothing I could possibly be missing. > > >>>>>>>>>> > > >>>>>>>>>> Thoughts? > > >>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>>> > > >>> > > >> > > >> > > > > > > > >