Re: Running Map and Reduce Sequentially

Matei Zaharia Sat, 14 Feb 2009 12:31:42 -0800

Do your mappers really need 768 MB? You can set the heap size differently
for them than for the reducers. The way you do this is to pass a different
value for mapred.child.java.opts to the reducers than the mappers (by adding
it in the JobConf in your driver program, or using -D
mapred.child.java.opts=whatever if you use bin/hadoop).


2009/2/13 Amandeep Khurana <ama...@gmail.com>

> Yes, number of output files = number of reducers. There is no downside of
> having a 50GB file. That really isnt too much of data. Ofcourse, multiple
> reducers would be much faster. But since you want a sequential run, having
> a
> single reducer is the only option I am aware of.
>
> You could consider lowering the memory allocated to the JVMs as well so
> that
> 4 tasks can run. I dont know if you want to do that or not.
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>
>
> 2009/2/13 Kris Jirapinyo <kris.jirapi...@biz360.com>
>
> > Thanks for the recommendation, haven't really looked into how the
> combiner
> > might be able to help.  Now, are there any downsides to having one 50GB
> > file
> > as an output?  If I understand correctly, the number of reducers you set
> > for
> > your job is the number of files you will get as output.
> >
> > 2009/2/13 Amandeep Khurana <ama...@gmail.com>
> >
> > > What you can probably do is have the combine function do some reducing
> > > before the single reducer starts off. That might help.
> > >
> > >
> > > Amandeep Khurana
> > > Computer Science Graduate Student
> > > University of California, Santa Cruz
> > >
> > >
> > > 2009/2/13 Kris Jirapinyo <kris.jirapi...@biz360.com>
> > >
> > > > I can't afford to have only one reducer as my dataset is huge...right
> > now
> > > > it
> > > > is 50GB and so the output.collect() in the reducer will surely run
> out
> > of
> > > > java heap space.
> > > >
> > > > 2009/2/13 Amandeep Khurana <ama...@gmail.com>
> > > >
> > > > > Have only one instance of the reduce task. This will run once your
> > map
> > > > > tasks
> > > > > are completed. You can set this in your job conf by using
> > > > > conf.setNumReducers(1)
> > > > >
> > > > >
> > > > > Amandeep Khurana
> > > > > Computer Science Graduate Student
> > > > > University of California, Santa Cruz
> > > > >
> > > > >
> > > > > 2009/2/13 Kris Jirapinyo <kris.jirapi...@biz360.com>
> > > > >
> > > > > > What do you mean when I have only 1 reducer?
> > > > > >
> > > > > > On Fri, Feb 13, 2009 at 4:11 PM, Rasit OZDAS <
> rasitoz...@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Kris,
> > > > > > > This is the case when you have only 1 reducer.
> > > > > > > If it doesn't have any side effects for you..
> > > > > > >
> > > > > > > Rasit
> > > > > > >
> > > > > > >
> > > > > > > 2009/2/14 Kris Jirapinyo <kjirapi...@biz360.com>:
> > > > > > > > Is there a way to tell Hadoop to not run Map and Reduce
> > > > concurrently?
> > > > > > >  I'm
> > > > > > > > running into a problem where I set the jvm to Xmx768 and it
> > seems
> > > > > like
> > > > > > 2
> > > > > > > > mappers and 2 reducers are running on each machine that only
> > has
> > > > > 1.7GB
> > > > > > of
> > > > > > > > ram, so it complains of not being able to allocate
> > > memory...(which
> > > > > > makes
> > > > > > > > sense since 4x768mb > 1.7GB).  So, if it would just finish
> the
> > > Map
> > > > > and
> > > > > > > then
> > > > > > > > start on Reduce, then there would be 2 jvm's running on one
> > > machine
> > > > > at
> > > > > > > any
> > > > > > > > given time and thus possibly avoid this out of memory error.
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > M. Raşit ÖZDAŞ
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Running Map and Reduce Sequentially

Reply via email to