Are the record processing steps bound by a local machine resource - cpu,
disk io or other?
Some disk I/O. Not so much compared with the CPU. Basically it is a CPU
bound. This is why each machine has 16 cores.
What I often do when I have lots of small files to handle is use the
NlineInputFormat,
Each file contains a complete/independent set of records. I cannot mix
the data resulted from processing two different files.
---------
Ok. I think I need to re-explain my problem :)
While running jobs on these small files, the computation time was almost
5 times longer than expected. It looks like the job was affected by the
number of map task that I have (100). I don't know which are the best
parameters in my case (10MB files).
I have zero reduce tasks.