Hi Tiago, Here are a couple thoughts:
1) How much data are you outputting? Obviously there is a certain amount of IO involved in actually outputting data versus not ;-) 2) Are you using a reduce phase in this job? If so, since you're cutting off the data at map output time, you're also avoiding a whole sort computation which involves significant network IO, etc. 3) What version of Hadoop are you running? Thanks -Todd On Wed, May 6, 2009 at 12:23 PM, Tiago Macambira <macamb...@gmail.com>wrote: > I am developing a MR application w/ hadoop that is generating during it's > map phase a really large number of output keys and it is having an abysmal > performance. > > While just reading the said data takes 20 minutes and processing it but not > outputting anything from the map takes around 30 min, running the full > application takes around 4 hours. Is this a known or expected issue? > > Cheers. > Tiago Alves Macambira > -- > "I may be drunk, but in the morning I will be sober, while you will > still be stupid and ugly." -Winston Churchill >