Re: Processing 10MB files in Hadoop

Jeff Zhang Thu, 26 Nov 2009 08:05:45 -0800

Quote from the wiki doc

*The number of map tasks can also be increased manually using the
JobConf<http://wiki.apache.org/hadoop/JobConf>'s
conf.setNumMapTasks(int num). This can be used to increase the number of map
tasks, but will not set the number below that which Hadoop determines via
splitting the input data.*


So the number of map task is determited by InputFormat.
But you can manually set the number of reducer task to improve the
performance, because the default number of reducer task is 1


Jeff Zhang

On Thu, Nov 26, 2009 at 7:58 AM, CubicDesign <cubicdes...@gmail.com> wrote:

> But the documentation DO recommend to set it:
> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>
>
>
> PS: I am using streaming
>
>
>
>
> Jeff Zhang wrote:
>
>> Actually, you do not need to set the number of map task, the InputFormat
>> will compute it for you according your input data set.
>>
>> Jeff Zhang
>>
>>
>> On Thu, Nov 26, 2009 at 7:39 AM, CubicDesign <cubicdes...@gmail.com>
>> wrote:
>>
>>
>>
>>>  The number of mapper is determined by your InputFormat.
>>>
>>>
>>>> In common case, if file is smaller than one block size (which is 64M by
>>>> default), one mapper for this file. if file is larger than one block
>>>> size,
>>>> hadoop will split this large file, and the number of mapper for this
>>>> file
>>>> will be ceiling ( (size of file)/(size of block) )
>>>>
>>>>
>>>>
>>>>
>>>>
>>> Hi
>>>
>>> Do you mean, I should set the number of map tasks to 1 ????
>>> I want to process this file not in a single node but over the entire
>>> cluster. I need a lot of processing power in order to finish the job in
>>> hours instead of days.
>>>
>>>
>>>
>>
>>
>>
>

Re: Processing 10MB files in Hadoop

Reply via email to