In addition to what Aaron mentioned, you can configure the minimum split
size in hadoop-site.xml to have smaller or larger input splits depending on
your application.

-Jim

On Mon, Apr 20, 2009 at 12:18 AM, Aaron Kimball <aa...@cloudera.com> wrote:

> Yes, there can be more than one InputSplit per SequenceFile. The file will
> be split more-or-less along 64 MB boundaries. (the actual "edges" of the
> splits will be adjusted to hit the next block of key-value pairs, so it
> might be a few kilobytes off.)
>
> The SequenceFileInputFormat regards mapred.map.tasks
> (conf.setNumMapTasks())
> as a hint, not a set-in-stone metric. (The number of reduce tasks, though,
> is always 100% user-controlled.) If you need exact control over the number
> of map tasks, you'll need to subclass it and modify this behavior. That
> having been said -- are you sure you actually need to precisely control
> this
> value? Or is it enough to know how many splits were created?
>
> - Aaron
>
> On Sun, Apr 19, 2009 at 7:23 PM, Barnet Wagman <b.wag...@comcast.net>
> wrote:
>
> > Suppose a SequenceFile (containing keys and values that are
> BytesWritable)
> > is used as input. Will it be divided into InputSplits?  If so, what's the
> > criteria use for splitting?
> >
> > I'm interested in this because I need to control the number of map tasks
> > used, which (if I understand it correctly), is equal to the number of
> > InputSplits.
> >
> > thanks,
> >
> > bw
> >
>

Reply via email to