[
https://issues.apache.org/jira/browse/MAHOUT-565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972073#action_12972073
]
Ankur commented on MAHOUT-565:
------------------------------
> ...The shifted-in bits don't matter right?
You are right. This change is NOT needed. The masking is only needed when we
are getting back an integer from relevant bytes. Somewhere else (not in
Mahout's code) I was messing the bytes up when converting them back to an
integer. So out of caution I put this one. This particular change can be
discarded.
> The formatting changes are fine IMHO
Thanks. I set up the code template mentioned on "How to Contribute"
> There are several other changes in this patch, is that intended?
There are 2 noteworthy changes
1. Concatenating hash signatures in a sliding-window fashion. This makes sure
that an item falls into as many buckets as number of hash signatures selected
giving it more room for collision with similar items.
2. Fixing test case in TestMinHashClustering - This was missing evaluation on
last cluster.
I haven't had the time to write up the Mahout documentation for this. Also I
need to think about using the results in recommendations context. Any
suggestions ?
> Features incorrectly hashed in Minhash
> --------------------------------------
>
> Key: MAHOUT-565
> URL: https://issues.apache.org/jira/browse/MAHOUT-565
> Project: Mahout
> Issue Type: Bug
> Affects Versions: 0.4
> Reporter: Ankur
> Assignee: Ankur
> Attachments: jira-565.v1.patch
>
>
> Given a feature vector for which minhash signature is desired, each feature
> id (an integer) is converted to a byte array through a series of bit shift
> operations. Current implementation of these operations doesn't mask the bits
> being shifted resulting in sign bit being shifted.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.