[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506505#comment-13506505
 ] 

Steve Loughran commented on MAPREDUCE-4827:
-------------------------------------------

This looks good to me
 # as doug says, this could be a regression, perhaps a 
{{BetterHashPartitioner}} is needed, or make this an option.
 # we cannot put code from the JDK into the tree -or code that looks exactly 
like it, as we don't want another Oracle related lawsuit. That means we need to 
find some tangible reference -such as the page in Knuth, and we have to work it 
out from there by someone who hasn't looked at the HashMap code.
This is being over cautious, but given Oracle sued google over a max function 
we have to show that diligence.
                
> Increase hash quality of HashPartitioner
> ----------------------------------------
>
>                 Key: MAPREDUCE-4827
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4827
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Radim Kolar
>         Attachments: betterhash1.txt
>
>
> hash partitioner is using object.hashCode() for splitting keys into 
> partitions. This results in bad distributions because hashCode() quality is 
> poor. 
> These hashCode() functions are sometimes written by hand (very poor quality) 
> and sometimes generated from by commons lang code (poor quality). Applying 
> some transformation on top of hashCode() provides better distribution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to