[ https://issues.apache.org/jira/browse/MAHOUT-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12849402#action_12849402 ]
Cristi Prodan commented on MAHOUT-344: -------------------------------------- I've studied the min-hash algorithm these days, and your implementation a little bit. Also been looking through Mahout's code, the wiki, how to contribute etc. I'm thinking to try my hand at applying a patch to Mahout before submitting my proposal for GSoC. I would like to extend/improve this implementation. Could you please point out a way/idea on how I might do this ? (I would leave it's integrating with Taste as a second task for me.) Thank you. > Minhash based clustering > ------------------------- > > Key: MAHOUT-344 > URL: https://issues.apache.org/jira/browse/MAHOUT-344 > Project: Mahout > Issue Type: Bug > Components: Clustering > Affects Versions: 0.3 > Reporter: Ankur > Assignee: Ankur > Attachments: MAHOUT-344-v1.patch > > > Minhash clustering performs probabilistic dimension reduction of high > dimensional data. The essence of the technique is to hash each item using > multiple independent hash functions such that the probability of collision of > similar items is higher. Multiple such hash tables can then be constructed > to answer near neighbor type of queries efficiently. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.