The number of mapper is determined by your InputFormat.

In common case, if file is smaller than one block size (which is 64M by
default), one mapper for this file. if file is larger than one block size,
hadoop will split this large file, and the number of mapper for this file
will be ceiling ( (size of file)/(size of block) )

Jeff Zhang



On Thu, Nov 26, 2009 at 5:42 AM, Siddu <siddu.s...@gmail.com> wrote:

> On Thu, Nov 26, 2009 at 5:32 PM, Cubic <cubicdes...@gmail.com> wrote:
>
> > Hi list.
> >
> > I have small files containing data that has to be processed. A file
> > can be small, even down to 10MB (but it can me also 100-600MB large)
> > and contains at least 30000 records to be processed.
> > Processing one record can take 30 seconds to 2 minutes. My cluster is
> > about 10 nodes. Each node has 16 cores.
> >
> Sorry for deviating from the question  , but curious to know what does core
> here refer to ?
>
>
> > Anybody can give an idea about how to deal with these small files? It
> > is not quite a common Hadoop task; I know. For example, how many map
> > tasks should I set in this case?
> >
>
>
>
> --
> Regards,
> ~Sid~
> I have never met a man so ignorant that i couldn't learn something from him
>

Reply via email to