JobClient should not sort input-splits
--------------------------------------

                 Key: HADOOP-1440
                 URL: https://issues.apache.org/jira/browse/HADOOP-1440
             Project: Hadoop
          Issue Type: Improvement
          Components: mapred
    Affects Versions: 0.12.3
         Environment: All
            Reporter: Milind Bhandarkar
             Fix For: 0.14.0


Currently, the JobClient sorts the InputSplits returned by InputFormat in 
descending order, so that the map tasks corresponding to larger input-splits 
are scheduled first for execution than smaller ones. However, this causes 
problems in applications that produce data-sets partitioned similarly to the 
input partition with -reducer NONE.

With -reducer NONE, map task i produces part-i. Howver, in the typical 
applications that use -reducer NONE it should produce a partition that has the 
same index as the input parrtition.

(Of course, this requires that each partition should be fed in its entirety to 
a map, rather than splitting it into blocks, but that is a separate issue.)

Thus, sorting input splits should be either controllable via a configuration 
variable, or the FileInputFormat should sort the splits and JobClient should 
honor the order of splits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to