[ https://issues.apache.org/jira/browse/MAHOUT-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146215#comment-13146215 ]
Ankur commented on MAHOUT-344: ------------------------------ Grant, The idea behind keyGroups is to concatenate hashes from multiple hash functions reduce the probability of collision between 2 users that agreed on 1 or more individual hash values. This essentially improves the average similarity of users in a cluster. About documentation, I am caught up with a few urgent issues at work and will need more time. Hope to get some free cycles before end of this week. > Minhash based clustering > ------------------------- > > Key: MAHOUT-344 > URL: https://issues.apache.org/jira/browse/MAHOUT-344 > Project: Mahout > Issue Type: Bug > Components: Clustering > Affects Versions: 0.3 > Reporter: Ankur > Assignee: Ankur > Fix For: 0.4 > > Attachments: MAHOUT-344-v1.patch, MAHOUT-344-v2.patch, > MAHOUT-344-v3.patch, MAHOUT-344-v4.patch, MAHOUT-344-v5.patch, > MAHOUT-344-v6.patch, MAHOUT-344-v7.patch > > > Minhash clustering performs probabilistic dimension reduction of high > dimensional data. The essence of the technique is to hash each item using > multiple independent hash functions such that the probability of collision of > similar items is higher. Multiple such hash tables can then be constructed > to answer near neighbor type of queries efficiently. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira