Re: Shuffle speed?

hc busy Mon, 02 Mar 2009 13:26:49 -0800

There are a few things that caused this to happen to me earlier on.

Make sure to check that it actually makes progress. Sometimes, slowness is
result of negative progress: it gets to say 10% complete on reduce, and then
drop back down to 5%...In that case the output can output that line with the
slow throughput rate.


changing a few of the settings below did improve on things, but ultimately,
what fixed it for us was buying more hardware.

;-)

On Sun, Mar 1, 2009 at 10:21 PM, Jothi Padmanabhan <[email protected]>wrote:

> There are a lot of factors that affect shuffle speed.
>
> Some of them are:
>
> 1. The Number of reducers concurrently running in a node
> 2. The number of parallel copier threads that are pulling in map data (
> mapred.reduce.parallel.copies)
> 3. Size of the individual map outputs. If Map outputs are huge, they are
> shuffled to disk and there might be some contention if several files are
> written to disk at the same time
> 4. Size of the buffer reserved to accommodate map outputs on the reducer
> side ( mapred.job.shuffle.input.buffer.percent).
>
> Jothi
>
>
>
> On 2/28/09 6:55 AM, "Nathan Marz" <[email protected]> wrote:
>
> > The Hadoop shuffle phase seems painstakingly slow. For example, I am
> > running a very large job, and all the reducers report a status such as:
> >
> > "reduce > copy (14266 of 28243 at 1.30 MB/s)"
> >
> > This is after all the mappers are finished. Is it supposed to be so
> > slow?
> >
>
>

Re: Shuffle speed?

Reply via email to