This is normal behavior. The Reducer is guaranteed to receive all the results for its partition in sorted order. No reduce can start until all the maps are completed, since any running map could emit a result that would violate the order for the results it currently has. -C

On Mar 1, 2009, at 9:24 AM, Rasit OZDAS wrote:

Hi!

Whatever code I run on hadoop, reduce starts a few seconds after map
finishes.
And worse, when I run 10 jobs parallely (using threads and sending one after
another)
all maps finish sequentially, then after 8-10 seconds reduces start.
I use reducer also as combiner, my cluster has 6 machines, namenode and
jobtracker run also as slaves.
There were 44 maps and 6 reduces in the last example, I never tried a bigger
job.

What can the problem be? I've read somewhere that this is not the normal
behaviour.
Replication factor is 3.
Thank you in advance for any pointers.

Rasit

Reply via email to