On Wed, May 6, 2009 at 12:22 PM, Foss User <foss...@gmail.com> wrote:
> 1. Do the reducers of a job start only after all mappers have finished? > The reducer tasks start so they can begin copying map output, but your actual reduce function does not. This is because it doesn't know that the data for any given key has been completely generated by the map stage until the map stage is complete. > > 2. Say there are 10 slave nodes. Let us say one of the nodes is very > slow as compared to other nodes. So, while the mappers in the other 9 > have finished in 2 minutes, the one on the slow one might take 20 > minutes. Is Hadoop intelligent enough to redistribute the key-value > pairs assigned to this slow node to the free nodes and start new > mappers on them? > No. However, if you enable "speculative execution", it will schedule a second invocation of slow map tasks. This is helpful if the slowness is due to node-specific issues (eg a disk going bad or external load for whatever reason) but doesn't help at all if the data is intrinsically slow to process. > > 3. Is the above true for reducers also? > Speculative execution can be enabled separately for the mappers and reducers. The relevant configuration variables are mapred.map.tasks.speculative.execution and mapred.reduce.tasks.speculative.execution > > 4. Is it possible to run more than one mapper or one reducer per slave > node? If yes, can the number of mappers per node or number of reducers > per node be set anywhere in the conf files? > Yes. See mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum -Todd