[ 
https://issues.apache.org/jira/browse/MAHOUT-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013829#comment-13013829
 ] 

Sebastian Schelter commented on MAHOUT-643:
-------------------------------------------

I see some small todos left:

 * naming should be consistent, either CityBlockDistanceSimilarity and 
DistributedCityBlockDistanceVectorSimilarity or CityBlockSimilarity and 
DistributedCityBlockVectorSimilarity
 * a new entry should be added to 
org.apache.mahout.math.hadoop.similarity.SimilarityType
 * the "distributed" implementation is not correct IMHO, the output from 
weight() gives the number of users preferring a single item, the number of 
cooccurrences gives the intersection size

It should look like this:

... extends AbstractDistributedVectorSimilarity ...

  @Override
  protected double doComputeResult(int rowA, int rowB, Iterable<Cooccurrence> 
cooccurrences, double weightOfVectorA,
      double weightOfVectorB, int numberOfColumns) {
    int cooccurrenceCount = countElements(cooccurrences);
    if (cooccurrenceCount == 0) {
      return Double.NaN;
    }
    
    int distance = weightOfVectorA + weightOfVectorB - 2 * cooccurrenceCount;
    return 1.0 / (1.0 + distance);
  }

  @Override
  public double weight(Vector v) {
    return (double) countElements(v.iterateNonZero());
  }

> Adding CityBlockSimilarity and DistributedCityBlockDistanceVectorSimilarity
> ---------------------------------------------------------------------------
>
>                 Key: MAHOUT-643
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-643
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering, Math
>    Affects Versions: 0.5
>            Reporter: Daniel McEnnis
>            Assignee: Sean Owen
>            Priority: Minor
>              Labels: distance, patch, similarity
>             Fix For: 0.5
>
>         Attachments: MAHOUT-643-2.patch, MAHOUT-643.patch, patch4.txt
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> adding a new distance metric to the 0.5 branch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to