On Wed, Mar 3, 2010 at 11:21 AM, Tamas Jambor <[email protected]> wrote: > for me one good practical indication would be, as you mentioned, that we > don't have to deal with negative similarity, which is still a problem for > me.
(I opened https://issues.apache.org/jira/browse/MAHOUT-321 to track this -- I have most of a patch done which removes the "+1" and replaces it with capping.) > my understanding of the question is that when you center the data, you can > interpret person correlation as cosine similarity, but in fact that has > nothing to do with > cosine similarity in the sense of the original definition, since we > transformed the vectors so their direction is different. If you mean the cosine of the angle between the original and transformed vectors is different -- of course, yes. I'm suggesting the measure is more meaningful on the transformed vectors for the reasons I gave. I understand this to be standard practice, but am not an expert on this particular issue. You can easily implement an uncentered cosine similarity metric. If you have reasons to do this I'd be curious to hear -- are there practical reasons for it?
