The number of mapper is determined by your InputFormat.

In common case, if file is smaller than one block size (which is 64M by
default), one mapper for this file. if file is larger than one block size,
hadoop will split this large file, and the number of mapper for this file
will be ceiling ( (size of file)/(size of block) )

Hi

Do you mean, I should set the number of map tasks to 1 ????
I want to process this file not in a single node but over the entire cluster. I need a lot of processing power in order to finish the job in hours instead of days.

Reply via email to