[jira] Commented: (MAPREDUCE-1287) HashPartitioner calls hashCode() when there is only 1 reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795447#action_12795447 ] Tom White commented on MAPREDUCE-1287: -- Any reason that the old partitioner uses {{1 - numPartitions}} and the new one uses {{partitions - 1}}? It shouldn't make any difference since the partitioner is not actually used in the zero partition case, but it would be good to make the code consistent. > Clearly, any application that depends on the partitioner for correctness can > be rewritten, but is it worth calling out? I think so - put a comment in the release notes. > HashPartitioner calls hashCode() when there is only 1 reducer > - > > Key: MAPREDUCE-1287 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1287 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.22.0 >Reporter: Ed Mazur >Assignee: Ed Mazur >Priority: Minor > Fix For: 0.22.0 > > Attachments: M1287-4.patch, MAPREDUCE-1287.2.patch, > MAPREDUCE-1287.3.patch, MAPREDUCE-1287.patch > > > HashPartitioner could be optimized to not call the key's hashCode() if there > is only 1 reducer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1287) HashPartitioner calls hashCode() when there is only 1 reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792932#action_12792932 ] Hadoop QA commented on MAPREDUCE-1287: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428502/M1287-4.patch against trunk revision 892479. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/228/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/228/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/228/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/228/console This message is automatically generated. > HashPartitioner calls hashCode() when there is only 1 reducer > - > > Key: MAPREDUCE-1287 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1287 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.22.0 >Reporter: Ed Mazur >Assignee: Ed Mazur >Priority: Minor > Fix For: 0.22.0 > > Attachments: M1287-4.patch, MAPREDUCE-1287.2.patch, > MAPREDUCE-1287.3.patch, MAPREDUCE-1287.patch > > > HashPartitioner could be optimized to not call the key's hashCode() if there > is only 1 reducer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1287) HashPartitioner calls hashCode() when there is only 1 reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789323#action_12789323 ] Ed Mazur commented on MAPREDUCE-1287: - I haven't ran this, but here's a quick analysis: - Cost: (_number of map output pairs_)*(_cost of "reducers == 1" check_) - Gain: (_number of map output pairs_)*(_cost of key's hashCode()_), but only in the case of 1 reducer (no gain otherwise) Your suggestion of moving this into the framework makes a lot of sense. That way you only have to check for the 1 reducer case when you assign the partitioner and not for every map output, essentially eliminating the cost of the optimization. > HashPartitioner calls hashCode() when there is only 1 reducer > - > > Key: MAPREDUCE-1287 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1287 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.22.0 >Reporter: Ed Mazur >Assignee: Ed Mazur >Priority: Minor > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1287.2.patch, MAPREDUCE-1287.3.patch, > MAPREDUCE-1287.patch > > > HashPartitioner could be optimized to not call the key's hashCode() if there > is only 1 reducer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1287) HashPartitioner calls hashCode() when there is only 1 reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788792#action_12788792 ] Tom White commented on MAPREDUCE-1287: -- What size of performance gain does this change give? This might be better done in the framework, by using a special partitioner in the single reduce case. A class called, say, SinglePartitionPartitioner whose getPartition() method always returns 0. > HashPartitioner calls hashCode() when there is only 1 reducer > - > > Key: MAPREDUCE-1287 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1287 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.22.0 >Reporter: Ed Mazur >Priority: Minor > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1287.2.patch, MAPREDUCE-1287.3.patch, > MAPREDUCE-1287.patch > > > HashPartitioner could be optimized to not call the key's hashCode() if there > is only 1 reducer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1287) HashPartitioner calls hashCode() when there is only 1 reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788433#action_12788433 ] Hadoop QA commented on MAPREDUCE-1287: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12427524/MAPREDUCE-1287.3.patch against trunk revision 888761. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/312/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/312/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/312/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/312/console This message is automatically generated. > HashPartitioner calls hashCode() when there is only 1 reducer > - > > Key: MAPREDUCE-1287 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1287 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.22.0 >Reporter: Ed Mazur >Priority: Minor > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1287.2.patch, MAPREDUCE-1287.3.patch, > MAPREDUCE-1287.patch > > > HashPartitioner could be optimized to not call the key's hashCode() if there > is only 1 reducer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1287) HashPartitioner calls hashCode() when there is only 1 reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788320#action_12788320 ] Todd Lipcon commented on MAPREDUCE-1287: - Can you also make this change to the old API HashPartitioner? src/java/org/apache/hadoop/mapred/lib/HashPartitioner.java - The "else" on a separate line from the '}' is different style than the usual Hadoop style. Also a space after 'if' is usual style. > HashPartitioner calls hashCode() when there is only 1 reducer > - > > Key: MAPREDUCE-1287 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1287 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.22.0 >Reporter: Ed Mazur >Priority: Minor > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1287.patch > > > HashPartitioner could be optimized to not call the key's hashCode() if there > is only 1 reducer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.