[ 
https://issues.apache.org/jira/browse/MAHOUT-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146215#comment-13146215
 ] 

Ankur commented on MAHOUT-344:
------------------------------

Grant, The idea behind keyGroups is to concatenate hashes from multiple hash 
functions reduce the probability of collision between 2 users that agreed on 1 
or more individual hash values. This essentially improves the average 
similarity of users in a cluster.

About documentation, I am caught up with a few urgent issues at work and will 
need more time. Hope to get some free cycles before end of this week.   
                
> Minhash based clustering 
> -------------------------
>
>                 Key: MAHOUT-344
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-344
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.3
>            Reporter: Ankur
>            Assignee: Ankur
>             Fix For: 0.4
>
>         Attachments: MAHOUT-344-v1.patch, MAHOUT-344-v2.patch, 
> MAHOUT-344-v3.patch, MAHOUT-344-v4.patch, MAHOUT-344-v5.patch, 
> MAHOUT-344-v6.patch, MAHOUT-344-v7.patch
>
>
> Minhash clustering performs probabilistic dimension reduction of high 
> dimensional data. The essence of the technique is to hash each item using 
> multiple independent hash functions such that the probability of collision of 
> similar items is higher. Multiple such hash tables can then be constructed  
> to answer near neighbor type of queries efficiently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to