It may simply be that your JVM's are spending their time doing garbage collection instead of running your tasks. My book, in chapterr 6 has a section on how to tune your jobs, and how to determine what to tune. That chapter is available now as an alpha.
On Wed, May 6, 2009 at 1:29 PM, Todd Lipcon <t...@cloudera.com> wrote: > Hi Tiago, > > Here are a couple thoughts: > > 1) How much data are you outputting? Obviously there is a certain amount of > IO involved in actually outputting data versus not ;-) > > 2) Are you using a reduce phase in this job? If so, since you're cutting > off > the data at map output time, you're also avoiding a whole sort computation > which involves significant network IO, etc. > > 3) What version of Hadoop are you running? > > Thanks > -Todd > > On Wed, May 6, 2009 at 12:23 PM, Tiago Macambira <macamb...@gmail.com > >wrote: > > > I am developing a MR application w/ hadoop that is generating during it's > > map phase a really large number of output keys and it is having an > abysmal > > performance. > > > > While just reading the said data takes 20 minutes and processing it but > not > > outputting anything from the map takes around 30 min, running the full > > application takes around 4 hours. Is this a known or expected issue? > > > > Cheers. > > Tiago Alves Macambira > > -- > > "I may be drunk, but in the morning I will be sober, while you will > > still be stupid and ugly." -Winston Churchill > > > -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422 www.prohadoopbook.com a community for Hadoop Professionals