[ https://issues.apache.org/jira/browse/MAPREDUCE-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884237#action_12884237 ]
Hadoop QA commented on MAPREDUCE-1838: -------------------------------------- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12448455/MAPREDUCE-1838.patch against trunk revision 959509. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/277/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/277/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/277/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/277/console This message is automatically generated. > DistRaid map tasks have large variance in running times > ------------------------------------------------------- > > Key: MAPREDUCE-1838 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1838 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/raid > Affects Versions: 0.20.1 > Reporter: Ramkumar Vadali > Priority: Minor > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1838.patch > > > HDFS RAID uses map-reduce jobs to generate parity files for a set of source > files. Each map task gets a subset of files to operate on. The current code > assigns files by walking through the list of files given in the constructor > of DistRaid > The problem is that the list of files given to the constructor has the order > of (pretty much) the directory listing. When a large number of files is > added, files in that order tend to have the same size. Thus a map task can > end up with large files where as another can end up with small files, > increasing the variance in run times. > We could do smarter assignment by using the file sizes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.