[ 
https://issues.apache.org/jira/browse/MAHOUT-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12849402#action_12849402
 ] 

Cristi Prodan commented on MAHOUT-344:
--------------------------------------

I've studied the min-hash algorithm these days, and your implementation a 
little bit. Also been looking through Mahout's code, the wiki, how to 
contribute etc.

I'm thinking to try my hand at applying a patch to Mahout before submitting my 
proposal for GSoC. I would like to extend/improve this implementation. Could 
you please point out a way/idea on how I might do this ? (I would leave it's 
integrating with Taste as a second task for me.)

Thank you.

> Minhash based clustering 
> -------------------------
>
>                 Key: MAHOUT-344
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-344
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.3
>            Reporter: Ankur
>            Assignee: Ankur
>         Attachments: MAHOUT-344-v1.patch
>
>
> Minhash clustering performs probabilistic dimension reduction of high 
> dimensional data. The essence of the technique is to hash each item using 
> multiple independent hash functions such that the probability of collision of 
> similar items is higher. Multiple such hash tables can then be constructed  
> to answer near neighbor type of queries efficiently.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to