How does a ReduceTask determine which MapTask output to read?

Virajith Jalaparti Wed, 29 Jun 2011 14:29:37 -0700

Hi,

I was wondering what scheduling algorithm is used in Hadoop (version0.20.2 in particular), for a ReduceTask to determine in what order it issupposed to read the map outputs from the various mappers that have beenrun? In particular, suppose we have 10maps called map1, map2,....,map10. and say 2 reducers r1 and r2. Which map's output does r1/r2 readfrom first?

Also, suppose that the mapred.reduce.parallel.copies is set to 5. Thendo both r1 and r2 read from 5 map outputs concurrently?


Thanks,
Virajith

How does a ReduceTask determine which MapTask output to read?

Reply via email to