[ https://issues.apache.org/jira/browse/HADOOP-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Harsh J resolved HADOOP-960. ---------------------------- Resolution: Invalid The ability to specify "mapred.map.tasks" is going away with the new API of MR. The only _right_ way to control splits is to have your own InputFormat that does it the way you need it to. The default way has worked for many (being local-data sensitive, as long as such information is available, but also split size tunable), and can also be asked to process whole files with a very simple subclass/configuration. Resolving as invalid (now, and onwards) since InputFormat#getSplits(…) is not going anywhere, and can do what you want it to. Regd. record num splits, MR now has NLineInputFormat as well, which indeed opens and reads through the file. > Incorrect number of map tasks when there are multiple input files > ----------------------------------------------------------------- > > Key: HADOOP-960 > URL: https://issues.apache.org/jira/browse/HADOOP-960 > Project: Hadoop Common > Issue Type: Improvement > Components: documentation > Affects Versions: 0.10.1 > Reporter: Andrew McNabb > Priority: Minor > > This problem happens with hadoop-streaming and possibly elsewhere. If there > are 5 input files, it will create 130 map tasks, even if > mapred.map.tasks=128. The number of map tasks is incorrectly set to a > multiple of the number of files. (I wrote a much more complete bug report, > but Jira lost it when it had an error, so I'm not in the mood to write it all > again) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira