Re: Processing 10MB files in Hadoop

CubicDesign Thu, 26 Nov 2009 07:59:30 -0800

But the documentation DO recommend to set it:http://wiki.apache.org/hadoop/HowManyMapsAndReduces



PS: I am using streaming



Jeff Zhang wrote:

Actually, you do not need to set the number of map task, the InputFormat
will compute it for you according your input data set.

Jeff Zhang


On Thu, Nov 26, 2009 at 7:39 AM, CubicDesign <cubicdes...@gmail.com> wrote:

 The number of mapper is determined by your InputFormat.

In common case, if file is smaller than one block size (which is 64M by
default), one mapper for this file. if file is larger than one block size,
hadoop will split this large file, and the number of mapper for this file
will be ceiling ( (size of file)/(size of block) )

Hi

Do you mean, I should set the number of map tasks to 1 ????
I want to process this file not in a single node but over the entire
cluster. I need a lot of processing power in order to finish the job in
hours instead of days.

Re: Processing 10MB files in Hadoop

Reply via email to