[jira] [Updated] (MAPREDUCE-4827) Increase hash quality of HashPartitioner

2012-12-17 Thread Radim Kolar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Radim Kolar updated MAPREDUCE-4827:
---

Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

Patch rejected for backward compatibility reasons.

> Increase hash quality of HashPartitioner
> 
>
> Key: MAPREDUCE-4827
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4827
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Radim Kolar
> Attachments: betterhash1.txt, betterhash2.txt
>
>
> hash partitioner is using object.hashCode() for splitting keys into 
> partitions. This results in bad distributions because hashCode() quality is 
> poor. 
> These hashCode() functions are sometimes written by hand (very poor quality) 
> and sometimes generated from by commons lang code (poor quality). Applying 
> some transformation on top of hashCode() provides better distribution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4827) Increase hash quality of HashPartitioner

2012-12-05 Thread Radim Kolar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Radim Kolar updated MAPREDUCE-4827:
---

Attachment: betterhash2.txt

change it for old mapred api as well

> Increase hash quality of HashPartitioner
> 
>
> Key: MAPREDUCE-4827
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4827
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Radim Kolar
> Attachments: betterhash1.txt, betterhash2.txt
>
>
> hash partitioner is using object.hashCode() for splitting keys into 
> partitions. This results in bad distributions because hashCode() quality is 
> poor. 
> These hashCode() functions are sometimes written by hand (very poor quality) 
> and sometimes generated from by commons lang code (poor quality). Applying 
> some transformation on top of hashCode() provides better distribution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4827) Increase hash quality of HashPartitioner

2012-12-04 Thread Radim Kolar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Radim Kolar updated MAPREDUCE-4827:
---

Status: Patch Available  (was: Open)

> Increase hash quality of HashPartitioner
> 
>
> Key: MAPREDUCE-4827
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4827
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Radim Kolar
> Attachments: betterhash1.txt
>
>
> hash partitioner is using object.hashCode() for splitting keys into 
> partitions. This results in bad distributions because hashCode() quality is 
> poor. 
> These hashCode() functions are sometimes written by hand (very poor quality) 
> and sometimes generated from by commons lang code (poor quality). Applying 
> some transformation on top of hashCode() provides better distribution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4827) Increase hash quality of HashPartitioner

2012-11-28 Thread Radim Kolar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Radim Kolar updated MAPREDUCE-4827:
---

Attachment: betterhash1.txt

> Increase hash quality of HashPartitioner
> 
>
> Key: MAPREDUCE-4827
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4827
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Radim Kolar
> Attachments: betterhash1.txt
>
>
> hash partitioner is using object.hashCode() for splitting keys into 
> partitions. This results in bad distributions because hashCode() quality is 
> poor. 
> These hashCode() functions are sometimes written by hand (very poor quality) 
> and sometimes generated from by commons lang code (poor quality). Applying 
> some transformation on top of hashCode() provides better distribution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira