Is the map progress indicator computed as a percentage of maps completed? -Daniel
On Wed, Jun 4, 2008 at 6:51 PM, Tanton Gibbs <[EMAIL PROTECTED]> wrote: > From what I've read, there are three reduce phases 1. copy 2. sort 3. > reduce > From 0 - 33% is the copy phase. I guess if you don't need that phase > it could skip this completely. > After 33%, it waits until it is done sorting before outputting status > again at 66%, then it updates regularly during the reduce phase to > 100%. This has been my experience, at least. > > Tanton > > On Wed, Jun 4, 2008 at 4:19 PM, Stuart Sierra <[EMAIL PROTECTED]> > wrote: > > How does Hadoop decide when to update the "percent complete" for > > map/reduce tasks? I've been running a small job (~150 MB) on a > > pseudo-distributed cluster. "bin/hadoop jar" prints: > > > > 08/06/04 17:02:16 INFO mapred.JobClient: map 0% reduce 0% > > 08/06/04 17:05:52 INFO mapred.JobClient: map 100% reduce 0% > > 08/06/04 17:06:05 INFO mapred.JobClient: map 100% reduce 66% > > 08/06/04 17:06:10 INFO mapred.JobClient: map 100% reduce 67% > > 08/06/04 17:06:17 INFO mapred.JobClient: map 100% reduce 68% > > > > And so on until the job completes. What seems odd is that I don't get > > any feedback at all on the progress of the map task until it reaches > > 100%, and I get no feedback on the reduce task until it reaches 66%. > > After that, I get updates every few seconds. The TaskTracker shows > > the same thing. What might cause this? > > > > This is Hadoop 0.17. The input and output are both text, both ~140MB, > > gzip-compressed down to ~12MB. > > > > Thanks, > > -Stuart > > >