[ https://issues.apache.org/jira/browse/HADOOP-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Senthil Subramanian updated HADOOP-1440: ---------------------------------------- Attachment: HADOOP-1440_1.patch Patch which implements the solution proposed by Doug: >> 1. Use the order returned from getSplits() to determine the map name, and >> hence the output names when reduce is disabled. >> 2. Continue to sort by the length of the input to determine task execution >> order. > JobClient should not sort input-splits > -------------------------------------- > > Key: HADOOP-1440 > URL: https://issues.apache.org/jira/browse/HADOOP-1440 > Project: Hadoop > Issue Type: Improvement > Components: mapred > Affects Versions: 0.12.3 > Environment: All > Reporter: Milind Bhandarkar > Assignee: Milind Bhandarkar > Fix For: 0.14.0 > > Attachments: HADOOP-1440_1.patch > > > Currently, the JobClient sorts the InputSplits returned by InputFormat in > descending order, so that the map tasks corresponding to larger input-splits > are scheduled first for execution than smaller ones. However, this causes > problems in applications that produce data-sets partitioned similarly to the > input partition with -reducer NONE. > With -reducer NONE, map task i produces part-i. Howver, in the typical > applications that use -reducer NONE it should produce a partition that has > the same index as the input parrtition. > (Of course, this requires that each partition should be fed in its entirety > to a map, rather than splitting it into blocks, but that is a separate issue.) > Thus, sorting input splits should be either controllable via a configuration > variable, or the FileInputFormat should sort the splits and JobClient should > honor the order of splits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.