>>Yeah, but what's the point of using Hadoop then? i.e. we lost all the >>parallelism?
Some jobs do not need it. For example, I am working with the Hive sub project. If I have a table that is less then my block size. Having a large number of mappers or reducers is counter productive. Hadoop will start up mappers that never get any data. Setting the job tracker to 'local' or setting map tasks and reduce tasks to 1 makes the job finish faster. 20 seconds vs 10 seconds. If you have a small data set and a system with 8 cores, the MiniMR cluster can possibly be used as an embedded hadoop. For some jobs the most efficient parallelism might be 1. WordCount of "1 2 3 4 5 6" on the MiniMRCluster test case takes less then two seconds. It may not be the common case, but it may be feasible to use hadoop in that manner.