[jira] [Commented] (MAPREDUCE-4887) Rehashing partitioner for better distribution
[ https://issues.apache.org/jira/browse/MAPREDUCE-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537884#comment-13537884 ] Hudson commented on MAPREDUCE-4887: --- Integrated in Hadoop-Mapreduce-trunk #1291 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1291/]) MAPREDUCE-4887. Add RehashPartitioner, to smooth distributions with poor implementations of Object#hashCode(). Contributed by Radim Kolar. (Revision 1424158) Result = SUCCESS cutting : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1424158 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/RehashPartitioner.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition/TestRehashPartitioner.java > Rehashing partitioner for better distribution > - > > Key: MAPREDUCE-4887 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4887 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Radim Kolar >Assignee: Radim Kolar > Fix For: 3.0.0 > > Attachments: rehash1.txt, rehash2.txt, rehash3.txt, rehash4.txt > > > rehash value returned by Object.hashCode() to get better distribution -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4887) Rehashing partitioner for better distribution
[ https://issues.apache.org/jira/browse/MAPREDUCE-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537011#comment-13537011 ] Hudson commented on MAPREDUCE-4887: --- Integrated in Hadoop-Hdfs-trunk #1260 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1260/]) MAPREDUCE-4887. Add RehashPartitioner, to smooth distributions with poor implementations of Object#hashCode(). Contributed by Radim Kolar. (Revision 1424158) Result = FAILURE cutting : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1424158 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/RehashPartitioner.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition/TestRehashPartitioner.java > Rehashing partitioner for better distribution > - > > Key: MAPREDUCE-4887 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4887 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Radim Kolar >Assignee: Radim Kolar > Fix For: 3.0.0 > > Attachments: rehash1.txt, rehash2.txt, rehash3.txt, rehash4.txt > > > rehash value returned by Object.hashCode() to get better distribution -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4887) Rehashing partitioner for better distribution
[ https://issues.apache.org/jira/browse/MAPREDUCE-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536958#comment-13536958 ] Hudson commented on MAPREDUCE-4887: --- Integrated in Hadoop-Yarn-trunk #71 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/71/]) MAPREDUCE-4887. Add RehashPartitioner, to smooth distributions with poor implementations of Object#hashCode(). Contributed by Radim Kolar. (Revision 1424158) Result = SUCCESS cutting : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1424158 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/RehashPartitioner.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition/TestRehashPartitioner.java > Rehashing partitioner for better distribution > - > > Key: MAPREDUCE-4887 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4887 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Radim Kolar >Assignee: Radim Kolar > Fix For: 3.0.0 > > Attachments: rehash1.txt, rehash2.txt, rehash3.txt, rehash4.txt > > > rehash value returned by Object.hashCode() to get better distribution -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4887) Rehashing partitioner for better distribution
[ https://issues.apache.org/jira/browse/MAPREDUCE-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536455#comment-13536455 ] Hudson commented on MAPREDUCE-4887: --- Integrated in Hadoop-trunk-Commit #3143 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3143/]) MAPREDUCE-4887. Add RehashPartitioner, to smooth distributions with poor implementations of Object#hashCode(). Contributed by Radim Kolar. (Revision 1424158) Result = SUCCESS cutting : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1424158 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/RehashPartitioner.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/partition/TestRehashPartitioner.java > Rehashing partitioner for better distribution > - > > Key: MAPREDUCE-4887 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4887 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Radim Kolar >Assignee: Radim Kolar > Fix For: 3.0.0 > > Attachments: rehash1.txt, rehash2.txt, rehash3.txt, rehash4.txt > > > rehash value returned by Object.hashCode() to get better distribution -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4887) Rehashing partitioner for better distribution
[ https://issues.apache.org/jira/browse/MAPREDUCE-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535606#comment-13535606 ] Hadoop QA commented on MAPREDUCE-4887: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12561627/rehash4.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3139//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3139//console This message is automatically generated. > Rehashing partitioner for better distribution > - > > Key: MAPREDUCE-4887 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4887 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Radim Kolar > Attachments: rehash1.txt, rehash2.txt, rehash3.txt, rehash4.txt > > > rehash value returned by Object.hashCode() to get better distribution -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4887) Rehashing partitioner for better distribution
[ https://issues.apache.org/jira/browse/MAPREDUCE-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535588#comment-13535588 ] Radim Kolar commented on MAPREDUCE-4887: Very smooth distribution for pattern. If you were not defending people depending on undocumented behavior, you would make it default. Dumping buckets distribution: min=902 avg=1043 max=1184 bucket 0 964 items, variance -0.07574304889741132 bucket 1 1042 items, variance -9.587727708533077E-4 bucket 2 1101 items, variance 0.05560882070949185 bucket 3 1039 items, variance -0.003835091083413231 bucket 4 1099 items, variance 0.053691275167785234 bucket 5 1044 items, variance 9.587727708533077E-4 bucket 6 998 items, variance -0.04314477468839885 bucket 7 1040 items, variance -0.0028763183125599234 bucket 8 1184 items, variance 0.13518696069031638 bucket 9 976 items, variance -0.06423777564717162 bucket 10 902 items, variance -0.13518696069031638 bucket 11 1124 items, variance 0.07766059443911794 bucket 12 931 items, variance -0.10738255033557047 bucket 13 1094 items, variance 0.0488974113135187 bucket 14 1152 items, variance 0.10450623202301054 bucket 15 977 items, variance -0.06327900287631831 bucket 16 1057 items, variance 0.013422818791946308 bucket 17 1048 items, variance 0.004793863854266539 bucket 18 1052 items, variance 0.00862895493767977 bucket 19 1042 items, variance -9.587727708533077E-4 bucket 20 1028 items, variance -0.014381591562799617 bucket 21 1038 items, variance -0.004793863854266539 bucket 22 1037 items, variance -0.005752636625119847 bucket 23 1040 items, variance -0.0028763183125599234 bucket 24 1084 items, variance 0.039309683604985615 bucket 25 974 items, variance -0.06615532118887824 bucket 26 954 items, variance -0.08533077660594439 bucket 27 1122 items, variance 0.07574304889741132 bucket 28 1009 items, variance -0.032598274209012464 bucket 29 1095 items, variance 0.04985618408437201 bucket 30 1109 items, variance 0.06327900287631831 bucket 31 978 items, variance -0.062320230105465 0 of 32 are too small or large buckets > Rehashing partitioner for better distribution > - > > Key: MAPREDUCE-4887 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4887 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Radim Kolar > Attachments: rehash1.txt, rehash2.txt, rehash3.txt > > > rehash value returned by Object.hashCode() to get better distribution -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4887) Rehashing partitioner for better distribution
[ https://issues.apache.org/jira/browse/MAPREDUCE-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535089#comment-13535089 ] Radim Kolar commented on MAPREDUCE-4887: HashPartitioner do not have unit tests either. > Rehashing partitioner for better distribution > - > > Key: MAPREDUCE-4887 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4887 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Radim Kolar > Attachments: rehash1.txt, rehash2.txt > > > rehash value returned by Object.hashCode() to get better distribution -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4887) Rehashing partitioner for better distribution
[ https://issues.apache.org/jira/browse/MAPREDUCE-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13534580#comment-13534580 ] Hadoop QA commented on MAPREDUCE-4887: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12561407/rehash2.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3130//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3130//console This message is automatically generated. > Rehashing partitioner for better distribution > - > > Key: MAPREDUCE-4887 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4887 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Radim Kolar > Attachments: rehash1.txt, rehash2.txt > > > rehash value returned by Object.hashCode() to get better distribution -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4887) Rehashing partitioner for better distribution
[ https://issues.apache.org/jira/browse/MAPREDUCE-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13534470#comment-13534470 ] Doug Cutting commented on MAPREDUCE-4887: - This looks like a good addition. The javadoc might provide more detail, e.g., that a smoother partitioning may improve reduce time in some cases and should harm things in no cases, that this is suggested with Integer and Long keys with simple patterns in their distributions. > Rehashing partitioner for better distribution > - > > Key: MAPREDUCE-4887 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4887 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Radim Kolar > Attachments: rehash1.txt > > > rehash value returned by Object.hashCode() to get better distribution -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira