In addition to what Aaron mentioned, you can configure the minimum split size in hadoop-site.xml to have smaller or larger input splits depending on your application.
-Jim On Mon, Apr 20, 2009 at 12:18 AM, Aaron Kimball <aa...@cloudera.com> wrote: > Yes, there can be more than one InputSplit per SequenceFile. The file will > be split more-or-less along 64 MB boundaries. (the actual "edges" of the > splits will be adjusted to hit the next block of key-value pairs, so it > might be a few kilobytes off.) > > The SequenceFileInputFormat regards mapred.map.tasks > (conf.setNumMapTasks()) > as a hint, not a set-in-stone metric. (The number of reduce tasks, though, > is always 100% user-controlled.) If you need exact control over the number > of map tasks, you'll need to subclass it and modify this behavior. That > having been said -- are you sure you actually need to precisely control > this > value? Or is it enough to know how many splits were created? > > - Aaron > > On Sun, Apr 19, 2009 at 7:23 PM, Barnet Wagman <b.wag...@comcast.net> > wrote: > > > Suppose a SequenceFile (containing keys and values that are > BytesWritable) > > is used as input. Will it be divided into InputSplits? If so, what's the > > criteria use for splitting? > > > > I'm interested in this because I need to control the number of map tasks > > used, which (if I understand it correctly), is equal to the number of > > InputSplits. > > > > thanks, > > > > bw > > >