Re: Shuffle speed?

Jothi Padmanabhan Sun, 01 Mar 2009 22:22:26 -0800

There are a lot of factors that affect shuffle speed.

Some of them are:

1. The Number of reducers concurrently running in a node
2. The number of parallel copier threads that are pulling in map data (
mapred.reduce.parallel.copies)
3. Size of the individual map outputs. If Map outputs are huge, they are
shuffled to disk and there might be some contention if several files are
written to disk at the same time
4. Size of the buffer reserved to accommodate map outputs on the reducer
side ( mapred.job.shuffle.input.buffer.percent).

Jothi

On 2/28/09 6:55 AM, "Nathan Marz" <nat...@rapleaf.com> wrote:

> The Hadoop shuffle phase seems painstakingly slow. For example, I am
> running a very large job, and all the reducers report a status such as:
> 
> "reduce > copy (14266 of 28243 at 1.30 MB/s)"
> 
> This is after all the mappers are finished. Is it supposed to be so
> slow?
>

Re: Shuffle speed?

Reply via email to