[EMAIL PROTECTED] wrote:
As far as we understood from MapRed documentation all reduce tasks must be
launched after last map task is finished e.g map and reduce must not work
simultaneously. But often in logs we see such records: "map 80%, reduce 10%"
and many more records where map is less then 100% but reduce more than 0%.
How should we interpret this?

Hadoop includes the "shuffle" stage in reduce. Currently, first 25% of a reduce task's progress is copying map outputs to the reduce node. These copies can start as soon as any map tasks completes, so that, when the last map task completes there is very little data remaining to be copied, and the rest of the reduce work can quickly start.

Doug

Reply via email to