[jira] [Updated] (MAPREDUCE-4827) Increase hash quality of HashPartitioner
[ https://issues.apache.org/jira/browse/MAPREDUCE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar updated MAPREDUCE-4827: --- Resolution: Won't Fix Status: Resolved (was: Patch Available) Patch rejected for backward compatibility reasons. > Increase hash quality of HashPartitioner > > > Key: MAPREDUCE-4827 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4827 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Radim Kolar > Attachments: betterhash1.txt, betterhash2.txt > > > hash partitioner is using object.hashCode() for splitting keys into > partitions. This results in bad distributions because hashCode() quality is > poor. > These hashCode() functions are sometimes written by hand (very poor quality) > and sometimes generated from by commons lang code (poor quality). Applying > some transformation on top of hashCode() provides better distribution. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4827) Increase hash quality of HashPartitioner
[ https://issues.apache.org/jira/browse/MAPREDUCE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar updated MAPREDUCE-4827: --- Attachment: betterhash2.txt change it for old mapred api as well > Increase hash quality of HashPartitioner > > > Key: MAPREDUCE-4827 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4827 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Radim Kolar > Attachments: betterhash1.txt, betterhash2.txt > > > hash partitioner is using object.hashCode() for splitting keys into > partitions. This results in bad distributions because hashCode() quality is > poor. > These hashCode() functions are sometimes written by hand (very poor quality) > and sometimes generated from by commons lang code (poor quality). Applying > some transformation on top of hashCode() provides better distribution. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4827) Increase hash quality of HashPartitioner
[ https://issues.apache.org/jira/browse/MAPREDUCE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar updated MAPREDUCE-4827: --- Status: Patch Available (was: Open) > Increase hash quality of HashPartitioner > > > Key: MAPREDUCE-4827 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4827 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Radim Kolar > Attachments: betterhash1.txt > > > hash partitioner is using object.hashCode() for splitting keys into > partitions. This results in bad distributions because hashCode() quality is > poor. > These hashCode() functions are sometimes written by hand (very poor quality) > and sometimes generated from by commons lang code (poor quality). Applying > some transformation on top of hashCode() provides better distribution. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4827) Increase hash quality of HashPartitioner
[ https://issues.apache.org/jira/browse/MAPREDUCE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar updated MAPREDUCE-4827: --- Attachment: betterhash1.txt > Increase hash quality of HashPartitioner > > > Key: MAPREDUCE-4827 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4827 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Radim Kolar > Attachments: betterhash1.txt > > > hash partitioner is using object.hashCode() for splitting keys into > partitions. This results in bad distributions because hashCode() quality is > poor. > These hashCode() functions are sometimes written by hand (very poor quality) > and sometimes generated from by commons lang code (poor quality). Applying > some transformation on top of hashCode() provides better distribution. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira