What you can probably do is have the combine function do some reducing
before the single reducer starts off. That might help.


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


2009/2/13 Kris Jirapinyo <kris.jirapi...@biz360.com>

> I can't afford to have only one reducer as my dataset is huge...right now
> it
> is 50GB and so the output.collect() in the reducer will surely run out of
> java heap space.
>
> 2009/2/13 Amandeep Khurana <ama...@gmail.com>
>
> > Have only one instance of the reduce task. This will run once your map
> > tasks
> > are completed. You can set this in your job conf by using
> > conf.setNumReducers(1)
> >
> >
> > Amandeep Khurana
> > Computer Science Graduate Student
> > University of California, Santa Cruz
> >
> >
> > 2009/2/13 Kris Jirapinyo <kris.jirapi...@biz360.com>
> >
> > > What do you mean when I have only 1 reducer?
> > >
> > > On Fri, Feb 13, 2009 at 4:11 PM, Rasit OZDAS <rasitoz...@gmail.com>
> > wrote:
> > >
> > > > Kris,
> > > > This is the case when you have only 1 reducer.
> > > > If it doesn't have any side effects for you..
> > > >
> > > > Rasit
> > > >
> > > >
> > > > 2009/2/14 Kris Jirapinyo <kjirapi...@biz360.com>:
> > > > > Is there a way to tell Hadoop to not run Map and Reduce
> concurrently?
> > > >  I'm
> > > > > running into a problem where I set the jvm to Xmx768 and it seems
> > like
> > > 2
> > > > > mappers and 2 reducers are running on each machine that only has
> > 1.7GB
> > > of
> > > > > ram, so it complains of not being able to allocate memory...(which
> > > makes
> > > > > sense since 4x768mb > 1.7GB).  So, if it would just finish the Map
> > and
> > > > then
> > > > > start on Reduce, then there would be 2 jvm's running on one machine
> > at
> > > > any
> > > > > given time and thus possibly avoid this out of memory error.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > M. Raşit ÖZDAŞ
> > > >
> > >
> >
>

Reply via email to