[ 
https://issues.apache.org/jira/browse/HADOOP-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12497062
 ] 

Owen O'Malley commented on HADOOP-1385:
---------------------------------------

The extra function was to give an explicit name to what I was doing for hash. 
Clearly there are lots of things that you could use as a hash code, so if 
someone just wants the first 4 bytes they can call the new function rather than 
hash code.

The loop is easier to read and less likely to get wrong and a decent optimizer 
could unroll the loop for you. Granted, I expect in Java it is not done, but I 
don't think this function to be a performance bottleneck.

> MD5Hash has a bad hash function
> -------------------------------
>
>                 Key: HADOOP-1385
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1385
>             Project: Hadoop
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.12.3
>            Reporter: Owen O'Malley
>         Assigned To: Owen O'Malley
>             Fix For: 0.13.0
>
>         Attachments: 1385.patch
>
>
> The MD5Hash class has a really bad hash function, that will cause most most 
> md5s to hash to 0xFFFFFFxx leaving only the low order byte as meaningful. The 
> problem comes from the automatic sign extension when promoting from byte to 
> int.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to