Sebastion, It will be in the next patch. Thanks for the heads up.
Daniel. On Wed, Mar 30, 2011 at 1:35 AM, Sebastian Schelter <[email protected]> wrote: > Hi Daniel, > > We would also need a "distributed" implementation of this new metric. Could > you do that too? > > Shouldn't be too hard, just have a look at the other implementations in > org.apache.mahout.math.hadoop.similarity.vector. > > --sebastian > > > On 30.03.2011 00:40, Sean Owen wrote: >> >> Great, the best place for this would be a JIRA issue: >> https://issues.apache.org/jira/browse/MAHOUT >> I think it needs a bit of style work. For example, it ought to be very >> much like TanimotoCoefficientSimilarity. If you copied that and edited >> a few key methods, you'd be a lot closer I think. >> I guess I find the core computation a little quirky: >> >> double distance = preferring1+preferring2 - 2*intersection; >> if(distance< 1.0){ >> distance=1.0-distance; >> }else{ >> distance = -1.0 + 1.0 / distance; >> } >> >> distance is an int, so I think it's >> >> int distance = preferring1+preferring2 - 2*intersection; >> if(distance == 0){ >> distance=1; >> }else{ >> distance = -1.0 + 1.0 / distance; >> } >> >> The resulting values are a little odd then -- it can return values in >> [-1,0], or 1. >> >> By default I'd go with something more like "1.0 / (1.0 + distance)" I >> suppose, though that's not somehow the one right way to map a distance >> to a similarity -- though it would be consistent with >> EuclideanDistanceSimilarity. >> >> >> I'd actually welcome you to expand this idea and not just make a >> "boolean pref" version of this but one that computes an actual >> city-block distance for prefs with ratings too, for completeness. >> >> >> I know this as "Manhattan distance". Is that an Americanism or is that >> actually the more common name to anyone? >> >> >> >> On Tue, Mar 29, 2011 at 10:16 PM, Daniel McEnnis<[email protected]> >> wrote: >>> >>> Dear, >>> >>> Here is a patch of a new distance metric for the collaborative >>> filtering modules - CityBlockDistance. With the 0 - 1 binary split on >>> preference. KLDistance, AHDistance, and Symmetric KLDistance don't >>> make sense. >>> >>> Daniel McEnnis. > >
