I'm not familiar with the recommender code at all. I was only thinking of the clustering. How is dislike related to the cosine distance?
Also, CosineDistanceMeasure isn't really behaving like a measure in this case (the whole d(x, x) = 0 thing). Maybe it makes sense to have a specific subclass specifically for the recommender system? On Fri, Apr 5, 2013 at 12:00 AM, Sebastian Schelter <[email protected] > wrote: > I think that in our recommender code, 0 should mean no rating or no > interaction observed. I think modeling dislike with 0 creates lot of > unnecessary problems. > > On 04.04.2013 22:56, Andrew Musselman wrote: > > I see the arguments for having it defined, just raising the point that > it's > > a very strange spot to be in. > > > > If all users are zero except for one person who likes the lentil soup, > then > > the other users are equally different from that person. > > > > The problem for me is the discontinuity Sean mentions, where at zero you > go > > off a cliff and have no sense of distance. > > > > But for convenience and "behaving nicely" I'm fine with distance between > > zero vectors being zero. > > > > > > On Thu, Apr 4, 2013 at 1:50 PM, Dan Filimon <[email protected] > >wrote: > > > >> While I agree that it's fairly meaningless mathematically, this ensures > >> that the distance between two vectors that are the same is 0 always > holds. > >> Think of yourself using this class through the DistanceMeasure > interface. > >> The implicit expectation [1] here is that d(x, y) = 0 iff x = y. > >> > >> [1] http://en.wikipedia.org/wiki/Metric_(mathematics) > >> > >> > >> On Thu, Apr 4, 2013 at 11:40 PM, Andrew Musselman < > >> [email protected]> wrote: > >> > >>> I think it should return an "undefined" symbol. There is no angle > >> between > >>> two zero vectors. > >>> > >>> In a practical sense, taking two zero vectors to be equivalent in the > >>> context of user-item vectors, say, is dodgy in my opinion. That is > akin > >> to > >>> saying "If we both hate everything on this restaurant's menu we are the > >>> same person." > >>> > >>> > >>> On Thu, Apr 4, 2013 at 11:56 AM, Dan Filimon < > >> [email protected] > >>>> wrote: > >>> > >>>> Suneel is right. :) > >>>> > >>>> Let me explain how this came up: > >>>> - When clustering, and assigning a point to a cluster, the centroid > >> needs > >>>> to be updated. > >>>> - To update the centroid in the nearest neighbor searcher classes, the > >>>> centroid must first be removed. > >>>> - To remove the centroid, we get the closest vector (search for it, > and > >>> it > >>>> should be itself) and then remove it from the data structures. > >>>> => However, when the centroid is 0, the nearest vector (which should > be > >>>> itself) has a huge distance (1 rather than 0) and this trips a check. > >>>> > >>>> > >>>> On Thu, Apr 4, 2013 at 9:46 PM, Sean Owen <[email protected]> wrote: > >>>> > >>>>> It sounds pretty undefined, but I would tend to define the distance > >> as > >>>>> 0 in this case of course. And that means defining the cosine as 1. > >>>>> Which class in particular? There are a few implementations of this > >>>>> distance measure. > >>>>> > >>>>> On Thu, Apr 4, 2013 at 7:42 PM, Dan Filimon < > >>> [email protected] > >>>>> > >>>>> wrote: > >>>>>> In the case where both vectors are all zeros, the angle between > >> them > >>> is > >>>>> 0, > >>>>>> so the cosine is therefore 1 and the so the distance returned > >> should > >>>> be 0 > >>>>>> (unless I misunderstood what the distance does). > >>>>>> > >>>>>> In Mahout, when calling distance() however, if both the denominator > >>> and > >>>>>> dotProduct are 0 (which is true when both vectors are 0), the > >>> returned > >>>>>> value is 1. > >>>>>> > >>>>>> This looks like a bug to me and I would open a JIRA issue and fix > >> it > >>>> but > >>>>> I > >>>>>> want to make sure there's nothing I could possibly be missing. > >>>>>> > >>>>>> Thoughts? > >>>>> > >>>> > >>> > >> > > > >
