On May 5, 2009, at 4:47 AM, Christian Ulrik Søttrup wrote:
Hi all,
I have a job that creates very big local files so i need to split it
to as many mappers as possible. Now the DFS block size I'm
using means that this job is only split to 3 mappers. I don't want
to change the hdfs wide block size because it works for my other jobs.
I would rather keep the big files on HDFS and use -
Dmapred.min.split.size to get more maps to process your data....
http://hadoop.apache.org/core/docs/r0.20.0/mapred_tutorial.html#Job+Input
Arun