On Jul 11, 2008, at 1:35 PM, Mori Bellamy wrote:
hey all,
what dictates the "% complete" bars for maptasks and reduce tasks?
i ask because, for one of my map jobs, the tasks hang at 0% for a
long time until they jump to 100%.
Maps -> amount of input consumed (this is the normal case when you
are processing data on HDFS)
Reduces -> Shuffle is 0-33% (shuffle is the phase where you copy
output of the maps), Merge is 33-66% (here sorted map-outputs are
being merged), rest is reduce (where user's Reducer.reduce methods
are being invoked).
Arun