I am attempting to specify this for a single job during its creation/submission. Not via the general construct. I am using the new api so I am adding the values to the conf passed into new Job();
2012/3/10 WangRamon <ramon_w...@hotmail.com> > How man map/reduce tasks slots do you have for each node? If the > total number is 10, then you will use 10 * 4096mb memory when all tasks are > running, which is bigger than the total memory 32G you have for each node. > > ------------------------------ > Date: Sat, 10 Mar 2012 20:00:13 -0800 > Subject: Mapper Record Spillage > From: huh...@uhlisys.com > To: mapreduce-user@hadoop.apache.org > > I am attempting to speed up a mapping process whose input is GZIP compressed > CSV files. The files range from 1-2GB, I am running on a Cluster where each > node has a total of 32GB memory available to use. I have attempted to tweak > mapred.map.child.jvm.opts with -Xmx4096mb and io.sort.mb to 2048 to > accommodate > the size but I keep getting java heap errors or other memory related > problems. My row count per mapper is well below Integer.MAX_INTEGER limi t > by several orders of magnitude and the box is NOT using anywhere close to its > full memory allotment. How can I specify that this map task can have 3-4 > GB of memory for the collection, partition and sort process without constantly > spilling records to disk? >