Thanks for your response. I got a few more questions regarding optimizations.

1. Does hadoop clients locally cache the data it last requested?

2. Is the meta data for file blocks on data node kept in the
underlying OS's file system on namenode or is it kept in RAM of the
name node?

3. If no mapper more mapper functions can be run on the node that
contains the data on which the mapper has to act on, is Hadoop
intelligent enough to run the new mappers on some machines within the
same rack?

4. When can a case like the above happen? I mean when can it happen
that the maximum number of mappers for a tasktracker configure has
been reached but Hadoop still needs to start more mappers?

5. Are the multiple mappers and reducers run as separate threads
within the same TaskTracker process?

Reply via email to