Do your mappers really need 768 MB? You can set the heap size differently for them than for the reducers. The way you do this is to pass a different value for mapred.child.java.opts to the reducers than the mappers (by adding it in the JobConf in your driver program, or using -D mapred.child.java.opts=whatever if you use bin/hadoop).
2009/2/13 Amandeep Khurana <ama...@gmail.com> > Yes, number of output files = number of reducers. There is no downside of > having a 50GB file. That really isnt too much of data. Ofcourse, multiple > reducers would be much faster. But since you want a sequential run, having > a > single reducer is the only option I am aware of. > > You could consider lowering the memory allocated to the JVMs as well so > that > 4 tasks can run. I dont know if you want to do that or not. > > > Amandeep Khurana > Computer Science Graduate Student > University of California, Santa Cruz > > > 2009/2/13 Kris Jirapinyo <kris.jirapi...@biz360.com> > > > Thanks for the recommendation, haven't really looked into how the > combiner > > might be able to help. Now, are there any downsides to having one 50GB > > file > > as an output? If I understand correctly, the number of reducers you set > > for > > your job is the number of files you will get as output. > > > > 2009/2/13 Amandeep Khurana <ama...@gmail.com> > > > > > What you can probably do is have the combine function do some reducing > > > before the single reducer starts off. That might help. > > > > > > > > > Amandeep Khurana > > > Computer Science Graduate Student > > > University of California, Santa Cruz > > > > > > > > > 2009/2/13 Kris Jirapinyo <kris.jirapi...@biz360.com> > > > > > > > I can't afford to have only one reducer as my dataset is huge...right > > now > > > > it > > > > is 50GB and so the output.collect() in the reducer will surely run > out > > of > > > > java heap space. > > > > > > > > 2009/2/13 Amandeep Khurana <ama...@gmail.com> > > > > > > > > > Have only one instance of the reduce task. This will run once your > > map > > > > > tasks > > > > > are completed. You can set this in your job conf by using > > > > > conf.setNumReducers(1) > > > > > > > > > > > > > > > Amandeep Khurana > > > > > Computer Science Graduate Student > > > > > University of California, Santa Cruz > > > > > > > > > > > > > > > 2009/2/13 Kris Jirapinyo <kris.jirapi...@biz360.com> > > > > > > > > > > > What do you mean when I have only 1 reducer? > > > > > > > > > > > > On Fri, Feb 13, 2009 at 4:11 PM, Rasit OZDAS < > rasitoz...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > Kris, > > > > > > > This is the case when you have only 1 reducer. > > > > > > > If it doesn't have any side effects for you.. > > > > > > > > > > > > > > Rasit > > > > > > > > > > > > > > > > > > > > > 2009/2/14 Kris Jirapinyo <kjirapi...@biz360.com>: > > > > > > > > Is there a way to tell Hadoop to not run Map and Reduce > > > > concurrently? > > > > > > > I'm > > > > > > > > running into a problem where I set the jvm to Xmx768 and it > > seems > > > > > like > > > > > > 2 > > > > > > > > mappers and 2 reducers are running on each machine that only > > has > > > > > 1.7GB > > > > > > of > > > > > > > > ram, so it complains of not being able to allocate > > > memory...(which > > > > > > makes > > > > > > > > sense since 4x768mb > 1.7GB). So, if it would just finish > the > > > Map > > > > > and > > > > > > > then > > > > > > > > start on Reduce, then there would be 2 jvm's running on one > > > machine > > > > > at > > > > > > > any > > > > > > > > given time and thus possibly avoid this out of memory error. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > M. Raşit ÖZDAŞ > > > > > > > > > > > > > > > > > > > > > > > > > > > >