[
https://issues.apache.org/jira/browse/MAHOUT-387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen updated MAHOUT-387:
-----------------------------
Status: Resolved (was: Patch Available)
Assignee: Sean Owen
Fix Version/s: 0.3
Resolution: Won't Fix
Yes like Jeff said, this actually exists as PearsonCorrelationSimilarity. In
the case where the mean of each series is 0, the result is the same. The
fastest way I know to see this is to just look at this form of the sample
correlation:
http://upload.wikimedia.org/math/c/a/6/ca68fbe94060a2591924b380c9bc4e27.png ...
and note that sum(xi) = sum (yi) = 0 when the mean of xi and yi are 0. You're
left with sum(xi*yi) in the numerator, which is the dot product, and
sqrt(sum(xi^2)) * sqrt(sum(yi^2)) in the denominator, which are the vector
sizes. This is just the cosine of the angle between x and y.
One can argue whether forcing the data to be centered is right. I think it's a
good thing in all cases. It adjusts for a user's tendency to rate high or low
on average. It also makes the computation simpler, and more consistent with
Pearson (well, it makes it identical!). This has a good treatment:
http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient#Geometric_interpretation
Only for this reason I'd mark this as won't-fix for the moment; the patch is
otherwise nice. I'd personally like to hear more about why to not center if
there's an argument for it.
> Cosine item similarity implementation
> -------------------------------------
>
> Key: MAHOUT-387
> URL: https://issues.apache.org/jira/browse/MAHOUT-387
> Project: Mahout
> Issue Type: New Feature
> Components: Collaborative Filtering
> Reporter: Sebastian Schelter
> Assignee: Sean Owen
> Fix For: 0.3
>
> Attachments: MAHOUT-387.patch
>
>
> I needed to compute the cosine similarity between two items when running
> org.apache.mahout.cf.taste.hadoop.pseudo.RecommenderJob, I couldn't find an
> implementation (did I overlook it maybe?) so I created my own. I want to
> share it here, in case you find it useful.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.