Thanks for your response. I got a few more questions regarding optimizations.
1. Does hadoop clients locally cache the data it last requested? 2. Is the meta data for file blocks on data node kept in the underlying OS's file system on namenode or is it kept in RAM of the name node? 3. If no mapper more mapper functions can be run on the node that contains the data on which the mapper has to act on, is Hadoop intelligent enough to run the new mappers on some machines within the same rack? 4. When can a case like the above happen? I mean when can it happen that the maximum number of mappers for a tasktracker configure has been reached but Hadoop still needs to start more mappers? 5. Are the multiple mappers and reducers run as separate threads within the same TaskTracker process?