[ https://issues.apache.org/jira/browse/MAHOUT-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cristi Prodan updated MAHOUT-344: --------------------------------- Attachment: MAHOUT-344-v2.patch See comment above for this patch. > Minhash based clustering > ------------------------- > > Key: MAHOUT-344 > URL: https://issues.apache.org/jira/browse/MAHOUT-344 > Project: Mahout > Issue Type: Bug > Components: Clustering > Affects Versions: 0.3 > Reporter: Ankur > Assignee: Ankur > Attachments: MAHOUT-344-v1.patch, MAHOUT-344-v2.patch > > > Minhash clustering performs probabilistic dimension reduction of high > dimensional data. The essence of the technique is to hash each item using > multiple independent hash functions such that the probability of collision of > similar items is higher. Multiple such hash tables can then be constructed > to answer near neighbor type of queries efficiently. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.