Currently reducers go through 2 phases
1) Shuffle phase : copying the map outputs
2) Reduce phase : Actual reducing
So by starting the reducers we actually start the shuffle phase. Ideally the shuffle phase should interleaved with the map phase.
Amar
On Mon, 3 Mar 2008, Marc Harris wrote:

I noticed when reading http://wiki.apache.org/hadoop/HardwareBenchmarks
the following comment:

"I ran into some odd behavior on Herd2 where if i [ . . . ] the reducers
don't start until the mappers finish, slowing the job significantly."

This puzzled me. I don't see how reducers can ever start before the
mappers have finished. I thought that any given call to a reducer will
supply all the (key,value) pairs for a given value of the key. How can a
reducer start until all the different values for a key are known? And
thus how can a reducer start before all the mappers have finished?

- Marc


Reply via email to