If that is the case then these two lines should make more than enough memory. On a virtually unused cluster.
job.getConfiguration().setInt("io.sort.mb", 2048); job.getConfiguration().set("mapred.map.child.java.opts", "-Xmx3072M"); Such that a conversion from 1GB of CSV Text to binary primitives should fit easily. but java still throws a heap error even when there is 25 GB of memory free. On Sat, Mar 10, 2012 at 11:50 PM, Harsh J <ha...@cloudera.com> wrote: > Hans, > > You can change memory requirements for tasks of a single job, but not > of a single task inside that job. > > This is briefly how the 0.20 framework (by default) works: TT has > notions only of "slots", and carries a maximum _number_ of > simultaneous slots it may run. It does not know of what each task, > occupying one slot, would demand in resource-terms. Your job then > supplies a # of map tasks, and amount of memory required per map task > in general, as a configuration. TTs then merely start the task JVMs > with the provided heap configuration. > > On Sun, Mar 11, 2012 at 11:24 AM, Hans Uhlig <huh...@uhlisys.com> wrote: > > That was a typo in my email not in the configuration. Is the memory > reserved > > for the tasks when the task tracker starts? You seem to be suggesting > that I > > need to set the memory to be the same for all map tasks. Is there no way > to > > override for a single map task? > > > > > > On Sat, Mar 10, 2012 at 8:41 PM, Harsh J <ha...@cloudera.com> wrote: > >> > >> Hans, > >> > >> Its possible you may have an typo issue: mapred.map.child.jvm.opts - > >> Such a property does not exist. Perhaps you wanted > >> "mapred.map.child.java.opts"? > >> > >> Additionally, the computation you need to do is (# of map slots on a > >> TT * per-map-task-heap-requirement) should be at least < (Total RAM - > >> 2/3 GB). With your 4 GB requirement, I guess you can support a max of > >> 6-7 slots per machine (i.e. Not counting reducer heap requirements in > >> parallel). > >> > >> On Sun, Mar 11, 2012 at 9:30 AM, Hans Uhlig <huh...@uhlisys.com> wrote: > >> > I am attempting to speed up a mapping process whose input is GZIP > >> > compressed > >> > CSV files. The files range from 1-2GB, I am running on a Cluster where > >> > each > >> > node has a total of 32GB memory available to use. I have attempted to > >> > tweak > >> > mapred.map.child.jvm.opts with -Xmx4096mb and io.sort.mb to 2048 to > >> > accommodate the size but I keep getting java heap errors or other > memory > >> > related problems. My row count per mapper is well below > >> > Integer.MAX_INTEGER > >> > limit by several orders of magnitude and the box is NOT using anywhere > >> > close > >> > to its full memory allotment. How can I specify that this map task can > >> > have > >> > 3-4 GB of memory for the collection, partition and sort process > without > >> > constantly spilling records to disk? > >> > >> > >> > >> -- > >> Harsh J > > > > > > > > -- > Harsh J >