[
https://issues.apache.org/jira/browse/HADOOP-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12497062
]
Owen O'Malley commented on HADOOP-1385:
---------------------------------------
The extra function was to give an explicit name to what I was doing for hash.
Clearly there are lots of things that you could use as a hash code, so if
someone just wants the first 4 bytes they can call the new function rather than
hash code.
The loop is easier to read and less likely to get wrong and a decent optimizer
could unroll the loop for you. Granted, I expect in Java it is not done, but I
don't think this function to be a performance bottleneck.
> MD5Hash has a bad hash function
> -------------------------------
>
> Key: HADOOP-1385
> URL: https://issues.apache.org/jira/browse/HADOOP-1385
> Project: Hadoop
> Issue Type: Bug
> Components: io
> Affects Versions: 0.12.3
> Reporter: Owen O'Malley
> Assigned To: Owen O'Malley
> Fix For: 0.13.0
>
> Attachments: 1385.patch
>
>
> The MD5Hash class has a really bad hash function, that will cause most most
> md5s to hash to 0xFFFFFFxx leaving only the low order byte as meaningful. The
> problem comes from the automatic sign extension when promoting from byte to
> int.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.