[jira] [Commented] (MAPREDUCE-4887) Rehashing partitioner for better distribution

2012-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537884#comment-13537884
 ] 

Hudson commented on MAPREDUCE-4887:
---

Integrated in Hadoop-Mapreduce-trunk #1291 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1291/])
MAPREDUCE-4887. Add RehashPartitioner, to smooth distributions with poor 
implementations of Object#hashCode(). Contributed by Radim Kolar. (Revision 
1424158)

 Result = SUCCESS
cutting : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1424158
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/RehashPartitioner.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition/TestRehashPartitioner.java


> Rehashing partitioner for better distribution
> -
>
> Key: MAPREDUCE-4887
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4887
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Radim Kolar
>Assignee: Radim Kolar
> Fix For: 3.0.0
>
> Attachments: rehash1.txt, rehash2.txt, rehash3.txt, rehash4.txt
>
>
> rehash value returned by Object.hashCode() to get better distribution

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4887) Rehashing partitioner for better distribution

2012-12-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537011#comment-13537011
 ] 

Hudson commented on MAPREDUCE-4887:
---

Integrated in Hadoop-Hdfs-trunk #1260 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1260/])
MAPREDUCE-4887. Add RehashPartitioner, to smooth distributions with poor 
implementations of Object#hashCode(). Contributed by Radim Kolar. (Revision 
1424158)

 Result = FAILURE
cutting : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1424158
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/RehashPartitioner.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition/TestRehashPartitioner.java


> Rehashing partitioner for better distribution
> -
>
> Key: MAPREDUCE-4887
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4887
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Radim Kolar
>Assignee: Radim Kolar
> Fix For: 3.0.0
>
> Attachments: rehash1.txt, rehash2.txt, rehash3.txt, rehash4.txt
>
>
> rehash value returned by Object.hashCode() to get better distribution

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4887) Rehashing partitioner for better distribution

2012-12-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536958#comment-13536958
 ] 

Hudson commented on MAPREDUCE-4887:
---

Integrated in Hadoop-Yarn-trunk #71 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/71/])
MAPREDUCE-4887. Add RehashPartitioner, to smooth distributions with poor 
implementations of Object#hashCode(). Contributed by Radim Kolar. (Revision 
1424158)

 Result = SUCCESS
cutting : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1424158
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/RehashPartitioner.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition/TestRehashPartitioner.java


> Rehashing partitioner for better distribution
> -
>
> Key: MAPREDUCE-4887
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4887
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Radim Kolar
>Assignee: Radim Kolar
> Fix For: 3.0.0
>
> Attachments: rehash1.txt, rehash2.txt, rehash3.txt, rehash4.txt
>
>
> rehash value returned by Object.hashCode() to get better distribution

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4887) Rehashing partitioner for better distribution

2012-12-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536455#comment-13536455
 ] 

Hudson commented on MAPREDUCE-4887:
---

Integrated in Hadoop-trunk-Commit #3143 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3143/])
MAPREDUCE-4887. Add RehashPartitioner, to smooth distributions with poor 
implementations of Object#hashCode(). Contributed by Radim Kolar. (Revision 
1424158)

 Result = SUCCESS
cutting : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1424158
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/RehashPartitioner.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition/TestRehashPartitioner.java


> Rehashing partitioner for better distribution
> -
>
> Key: MAPREDUCE-4887
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4887
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Radim Kolar
>Assignee: Radim Kolar
> Fix For: 3.0.0
>
> Attachments: rehash1.txt, rehash2.txt, rehash3.txt, rehash4.txt
>
>
> rehash value returned by Object.hashCode() to get better distribution

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4887) Rehashing partitioner for better distribution

2012-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535606#comment-13535606
 ] 

Hadoop QA commented on MAPREDUCE-4887:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12561627/rehash4.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3139//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3139//console

This message is automatically generated.

> Rehashing partitioner for better distribution
> -
>
> Key: MAPREDUCE-4887
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4887
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Radim Kolar
> Attachments: rehash1.txt, rehash2.txt, rehash3.txt, rehash4.txt
>
>
> rehash value returned by Object.hashCode() to get better distribution

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4887) Rehashing partitioner for better distribution

2012-12-18 Thread Radim Kolar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535588#comment-13535588
 ] 

Radim Kolar commented on MAPREDUCE-4887:


Very smooth distribution for pattern. If you were not defending people 
depending on undocumented behavior, you would make it default. 

Dumping buckets distribution: min=902 avg=1043 max=1184
bucket 0 964 items, variance -0.07574304889741132
bucket 1 1042 items, variance -9.587727708533077E-4
bucket 2 1101 items, variance 0.05560882070949185
bucket 3 1039 items, variance -0.003835091083413231
bucket 4 1099 items, variance 0.053691275167785234
bucket 5 1044 items, variance 9.587727708533077E-4
bucket 6 998 items, variance -0.04314477468839885
bucket 7 1040 items, variance -0.0028763183125599234
bucket 8 1184 items, variance 0.13518696069031638
bucket 9 976 items, variance -0.06423777564717162
bucket 10 902 items, variance -0.13518696069031638
bucket 11 1124 items, variance 0.07766059443911794
bucket 12 931 items, variance -0.10738255033557047
bucket 13 1094 items, variance 0.0488974113135187
bucket 14 1152 items, variance 0.10450623202301054
bucket 15 977 items, variance -0.06327900287631831
bucket 16 1057 items, variance 0.013422818791946308
bucket 17 1048 items, variance 0.004793863854266539
bucket 18 1052 items, variance 0.00862895493767977
bucket 19 1042 items, variance -9.587727708533077E-4
bucket 20 1028 items, variance -0.014381591562799617
bucket 21 1038 items, variance -0.004793863854266539
bucket 22 1037 items, variance -0.005752636625119847
bucket 23 1040 items, variance -0.0028763183125599234
bucket 24 1084 items, variance 0.039309683604985615
bucket 25 974 items, variance -0.06615532118887824
bucket 26 954 items, variance -0.08533077660594439
bucket 27 1122 items, variance 0.07574304889741132
bucket 28 1009 items, variance -0.032598274209012464
bucket 29 1095 items, variance 0.04985618408437201
bucket 30 1109 items, variance 0.06327900287631831
bucket 31 978 items, variance -0.062320230105465
0 of 32 are too small or large buckets


> Rehashing partitioner for better distribution
> -
>
> Key: MAPREDUCE-4887
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4887
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Radim Kolar
> Attachments: rehash1.txt, rehash2.txt, rehash3.txt
>
>
> rehash value returned by Object.hashCode() to get better distribution

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4887) Rehashing partitioner for better distribution

2012-12-18 Thread Radim Kolar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535089#comment-13535089
 ] 

Radim Kolar commented on MAPREDUCE-4887:


HashPartitioner do not have unit tests either.

> Rehashing partitioner for better distribution
> -
>
> Key: MAPREDUCE-4887
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4887
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Radim Kolar
> Attachments: rehash1.txt, rehash2.txt
>
>
> rehash value returned by Object.hashCode() to get better distribution

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4887) Rehashing partitioner for better distribution

2012-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13534580#comment-13534580
 ] 

Hadoop QA commented on MAPREDUCE-4887:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12561407/rehash2.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3130//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3130//console

This message is automatically generated.

> Rehashing partitioner for better distribution
> -
>
> Key: MAPREDUCE-4887
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4887
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Radim Kolar
> Attachments: rehash1.txt, rehash2.txt
>
>
> rehash value returned by Object.hashCode() to get better distribution

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4887) Rehashing partitioner for better distribution

2012-12-17 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13534470#comment-13534470
 ] 

Doug Cutting commented on MAPREDUCE-4887:
-

This looks like a good addition.  The javadoc might provide more detail, e.g., 
that a smoother partitioning may improve reduce time in some cases and should 
harm things in no cases, that this is suggested with Integer and Long keys with 
simple patterns in their distributions.

> Rehashing partitioner for better distribution
> -
>
> Key: MAPREDUCE-4887
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4887
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Radim Kolar
> Attachments: rehash1.txt
>
>
> rehash value returned by Object.hashCode() to get better distribution

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira