[ 
https://issues.apache.org/jira/browse/MAHOUT-565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972073#action_12972073
 ] 

Ankur commented on MAHOUT-565:
------------------------------

> ...The shifted-in bits don't matter right?
You are right. This change is NOT needed. The masking is only needed when we 
are getting back an integer from relevant bytes. Somewhere else (not in 
Mahout's code) I was messing the bytes up when converting them back to an 
integer. So out of caution I put this one. This particular change can be 
discarded.

> The formatting changes are fine IMHO
Thanks. I set up the code template mentioned on "How to Contribute"

> There are several other changes in this patch, is that intended?
There are 2 noteworthy changes
1. Concatenating hash signatures in a sliding-window fashion. This makes sure 
that an item falls into as many buckets as number of hash signatures selected 
giving it more room for collision with similar items.
2. Fixing test case in TestMinHashClustering - This was missing evaluation on 
last cluster.

I haven't had the time to write up the Mahout documentation for this. Also I 
need to think about using the results in recommendations context. Any 
suggestions ?

> Features incorrectly hashed in Minhash
> --------------------------------------
>
>                 Key: MAHOUT-565
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-565
>             Project: Mahout
>          Issue Type: Bug
>    Affects Versions: 0.4
>            Reporter: Ankur
>            Assignee: Ankur
>         Attachments: jira-565.v1.patch
>
>
> Given a feature vector for which minhash signature is desired, each feature 
> id (an integer) is converted to a byte array through a series of bit shift 
> operations. Current implementation of these operations doesn't mask the bits 
> being shifted resulting in sign bit being shifted.   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to