[ https://issues.apache.org/jira/browse/MAPREDUCE-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750951#action_12750951 ]
Iyappan Srinivasan commented on MAPREDUCE-318: ---------------------------------------------- +1 for testing Cluster conf with mapred.child.java.opts 512M and io.sort.factor 100. namenode heap size is 3GB and jobtracker heap size is 1GB. Some benchmarking and functionality test results. 1) default sort on a a 94 node cluster : trunk two attempts : 1)2376 seconds 2) 2589 seconds with patch last two attempts : 1) 1408 seconds 2) 1381 seconds 2) loadgen ona 94 node cluster: trunk : 57 minutes with patch two attempts : 1)56 minuts 9 seconds 2) 56 minuts 23 seconds. 3) gridmix2 on a 491 node cluster : trunk : 1 hour 7 minutes patch two attempts : 1) 57 minutes, 2) 47 minutes. 4) sort ( with memory-to-memory enabled) : passed 5) After starting the job with mapred.reduce.slowstart.completed.maps=1, remove some intermediate map output and corrupt some map output. verify if only those tasks are rerun. > Refactor reduce shuffle code > ---------------------------- > > Key: MAPREDUCE-318 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-318 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Reporter: Owen O'Malley > Assignee: Owen O'Malley > Attachments: HADOOP-5233_api.patch, HADOOP-5233_part0.patch, > mapred-318-14Aug.patch, mapred-318-20Aug.patch, mapred-318-24Aug.patch, > mapred-318-3Sep-v1.patch, mapred-318-3Sep.patch, mapred-318-common.patch > > > The reduce shuffle code has become very complex and entangled. I think we > should move it out of ReduceTask and into a separate package > (org.apache.hadoop.mapred.task.reduce). Details to follow. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.