Re: About Hadoop optimizations

Todd Lipcon Wed, 06 May 2009 13:22:23 -0700

On Wed, May 6, 2009 at 12:22 PM, Foss User <foss...@gmail.com> wrote:


> 1. Do the reducers of  a job start only after all mappers have finished?
>

The reducer tasks start so they can begin copying map output, but your
actual reduce function does not. This is because it doesn't know that the
data for any given key has been completely generated by the map stage until
the map stage is complete.


>
> 2. Say there are 10 slave nodes. Let us say one of the nodes is very
> slow as compared to other nodes. So, while the mappers in the other 9
> have finished in 2 minutes, the one on the slow one might take 20
> minutes. Is Hadoop intelligent enough to redistribute the key-value
> pairs assigned to this slow node to the free nodes and start new
> mappers on them?
>

No. However, if you enable "speculative execution", it will schedule a
second invocation of slow map tasks. This is helpful if the slowness is due
to node-specific issues (eg a disk going bad or external load for whatever
reason) but doesn't help at all if the data is intrinsically slow to
process.


>
> 3. Is the above true for reducers also?
>

Speculative execution can be enabled separately for the mappers and
reducers.

The relevant configuration variables are
mapred.map.tasks.speculative.execution and
mapred.reduce.tasks.speculative.execution


>
> 4. Is it possible to run more than one mapper or one reducer per slave
> node? If yes, can the number of mappers per node or number of reducers
> per node be set anywhere in the conf files?
>

Yes. See mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum

-Todd

Re: About Hadoop optimizations

Reply via email to