Yes, there can be more than one InputSplit per SequenceFile. The file will be split more-or-less along 64 MB boundaries. (the actual "edges" of the splits will be adjusted to hit the next block of key-value pairs, so it might be a few kilobytes off.)
The SequenceFileInputFormat regards mapred.map.tasks (conf.setNumMapTasks()) as a hint, not a set-in-stone metric. (The number of reduce tasks, though, is always 100% user-controlled.) If you need exact control over the number of map tasks, you'll need to subclass it and modify this behavior. That having been said -- are you sure you actually need to precisely control this value? Or is it enough to know how many splits were created? - Aaron On Sun, Apr 19, 2009 at 7:23 PM, Barnet Wagman <b.wag...@comcast.net> wrote: > Suppose a SequenceFile (containing keys and values that are BytesWritable) > is used as input. Will it be divided into InputSplits? If so, what's the > criteria use for splitting? > > I'm interested in this because I need to control the number of map tasks > used, which (if I understand it correctly), is equal to the number of > InputSplits. > > thanks, > > bw >