There are a lot of factors that affect shuffle speed.

Some of them are:

1. The Number of reducers concurrently running in a node
2. The number of parallel copier threads that are pulling in map data (
mapred.reduce.parallel.copies)
3. Size of the individual map outputs. If Map outputs are huge, they are
shuffled to disk and there might be some contention if several files are
written to disk at the same time
4. Size of the buffer reserved to accommodate map outputs on the reducer
side ( mapred.job.shuffle.input.buffer.percent).

Jothi



On 2/28/09 6:55 AM, "Nathan Marz" <nat...@rapleaf.com> wrote:

> The Hadoop shuffle phase seems painstakingly slow. For example, I am
> running a very large job, and all the reducers report a status such as:
> 
> "reduce > copy (14266 of 28243 at 1.30 MB/s)"
> 
> This is after all the mappers are finished. Is it supposed to be so
> slow?
> 

Reply via email to