I'm not familiar with the recommender code at all. I was only thinking of
the clustering.
How is dislike related to the cosine distance?

Also, CosineDistanceMeasure isn't really behaving like a measure in this
case (the whole d(x, x) = 0 thing). Maybe it makes sense to have a specific
subclass specifically for the recommender system?


On Fri, Apr 5, 2013 at 12:00 AM, Sebastian Schelter <[email protected]
> wrote:

> I think that in our recommender code, 0 should mean no rating or no
> interaction observed. I think modeling dislike with 0 creates lot of
> unnecessary problems.
>
> On 04.04.2013 22:56, Andrew Musselman wrote:
> > I see the arguments for having it defined, just raising the point that
> it's
> > a very strange spot to be in.
> >
> > If all users are zero except for one person who likes the lentil soup,
> then
> > the other users are equally different from that person.
> >
> > The problem for me is the discontinuity Sean mentions, where at zero you
> go
> > off a cliff and have no sense of distance.
> >
> > But for convenience and "behaving nicely" I'm fine with distance between
> > zero vectors being zero.
> >
> >
> > On Thu, Apr 4, 2013 at 1:50 PM, Dan Filimon <[email protected]
> >wrote:
> >
> >> While I agree that it's fairly meaningless mathematically, this ensures
> >> that the distance between two vectors that are the same is 0 always
> holds.
> >> Think of yourself using this class through the DistanceMeasure
> interface.
> >> The implicit expectation [1] here is that d(x, y) = 0 iff x = y.
> >>
> >> [1] http://en.wikipedia.org/wiki/Metric_(mathematics)
> >>
> >>
> >> On Thu, Apr 4, 2013 at 11:40 PM, Andrew Musselman <
> >> [email protected]> wrote:
> >>
> >>> I think it should return an "undefined" symbol.  There is no angle
> >> between
> >>> two zero vectors.
> >>>
> >>> In a practical sense, taking two zero vectors to be equivalent in the
> >>> context of user-item vectors, say, is dodgy in my opinion.  That is
> akin
> >> to
> >>> saying "If we both hate everything on this restaurant's menu we are the
> >>> same person."
> >>>
> >>>
> >>> On Thu, Apr 4, 2013 at 11:56 AM, Dan Filimon <
> >> [email protected]
> >>>> wrote:
> >>>
> >>>> Suneel is right. :)
> >>>>
> >>>> Let me explain how this came up:
> >>>> - When clustering, and assigning a point to a cluster, the centroid
> >> needs
> >>>> to be updated.
> >>>> - To update the centroid in the nearest neighbor searcher classes, the
> >>>> centroid must first be removed.
> >>>> - To remove the centroid, we get the closest vector (search for it,
> and
> >>> it
> >>>> should be itself) and then remove it from the data structures.
> >>>> => However, when the centroid is 0, the nearest vector (which should
> be
> >>>> itself) has a huge distance (1 rather than 0) and this trips a check.
> >>>>
> >>>>
> >>>> On Thu, Apr 4, 2013 at 9:46 PM, Sean Owen <[email protected]> wrote:
> >>>>
> >>>>> It sounds pretty undefined, but I would tend to define the distance
> >> as
> >>>>> 0 in this case of course. And that means defining the cosine as 1.
> >>>>> Which class in particular? There are a few implementations of this
> >>>>> distance measure.
> >>>>>
> >>>>> On Thu, Apr 4, 2013 at 7:42 PM, Dan Filimon <
> >>> [email protected]
> >>>>>
> >>>>> wrote:
> >>>>>> In the case where both vectors are all zeros, the angle between
> >> them
> >>> is
> >>>>> 0,
> >>>>>> so the cosine is therefore 1 and the so the distance returned
> >> should
> >>>> be 0
> >>>>>> (unless I misunderstood what the distance does).
> >>>>>>
> >>>>>> In Mahout, when calling distance() however, if both the denominator
> >>> and
> >>>>>> dotProduct are 0 (which is true when both vectors are 0), the
> >>> returned
> >>>>>> value is 1.
> >>>>>>
> >>>>>> This looks like a bug to me and I would open a JIRA issue and fix
> >> it
> >>>> but
> >>>>> I
> >>>>>> want to make sure there's nothing I could possibly be missing.
> >>>>>>
> >>>>>> Thoughts?
> >>>>>
> >>>>
> >>>
> >>
> >
>
>

Reply via email to