JobClient should not sort input-splits
--------------------------------------
Key: HADOOP-1440
URL: https://issues.apache.org/jira/browse/HADOOP-1440
Project: Hadoop
Issue Type: Improvement
Components: mapred
Affects Versions: 0.12.3
Environment: All
Reporter: Milind Bhandarkar
Fix For: 0.14.0
Currently, the JobClient sorts the InputSplits returned by InputFormat in
descending order, so that the map tasks corresponding to larger input-splits
are scheduled first for execution than smaller ones. However, this causes
problems in applications that produce data-sets partitioned similarly to the
input partition with -reducer NONE.
With -reducer NONE, map task i produces part-i. Howver, in the typical
applications that use -reducer NONE it should produce a partition that has the
same index as the input parrtition.
(Of course, this requires that each partition should be fed in its entirety to
a map, rather than splitting it into blocks, but that is a separate issue.)
Thus, sorting input splits should be either controllable via a configuration
variable, or the FileInputFormat should sort the splits and JobClient should
honor the order of splits.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.