Re: Large number of map output keys and performance issues.

jason hadoop Thu, 07 May 2009 08:32:00 -0700

It may simply be that your JVM's are spending their time doing garbage
collection instead of running your tasks.
My book, in chapterr 6 has a section on how to tune your jobs, and how to
determine what to tune. That chapter is available now as an alpha.


On Wed, May 6, 2009 at 1:29 PM, Todd Lipcon <t...@cloudera.com> wrote:

> Hi Tiago,
>
> Here are a couple thoughts:
>
> 1) How much data are you outputting? Obviously there is a certain amount of
> IO involved in actually outputting data versus not ;-)
>
> 2) Are you using a reduce phase in this job? If so, since you're cutting
> off
> the data at map output time, you're also avoiding a whole sort computation
> which involves significant network IO, etc.
>
> 3) What version of Hadoop are you running?
>
> Thanks
> -Todd
>
> On Wed, May 6, 2009 at 12:23 PM, Tiago Macambira <macamb...@gmail.com
> >wrote:
>
> > I am developing a MR application w/ hadoop that is generating during it's
> > map phase a really large number of output keys and it is having an
> abysmal
> > performance.
> >
> > While just reading the said data takes 20 minutes and processing it but
> not
> > outputting anything from the map takes around 30 min, running the full
> > application takes around 4 hours. Is this a known or expected issue?
> >
> > Cheers.
> > Tiago Alves Macambira
> > --
> > "I may be drunk, but in the morning I will be sober, while you will
> > still be stupid and ugly." -Winston Churchill
> >
>



-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422
www.prohadoopbook.com a community for Hadoop Professionals

Re: Large number of map output keys and performance issues.

Reply via email to