(Er, not sure how that ± got in there, I wished to type (-100, lowered further if it continued to show problems)).
On Sun, Mar 11, 2012 at 7:08 PM, Harsh J <ha...@cloudera.com> wrote: > Hans, > > I don't think io.sort.mb can support a whole 2048 value (it builds one > array with the size, and JVM may not be allowing that). Can you lower > it to 2000 ± 100 and try again? > > On Sun, Mar 11, 2012 at 1:36 PM, Hans Uhlig <huh...@uhlisys.com> wrote: >> If that is the case then these two lines should make more than enough >> memory. On a virtually unused cluster. >> >> job.getConfiguration().setInt("io.sort.mb", 2048); >> job.getConfiguration().set("mapred.map.child.java.opts", "-Xmx3072M"); >> >> Such that a conversion from 1GB of CSV Text to binary primitives should fit >> easily. but java still throws a heap error even when there is 25 GB of >> memory free. >> >> On Sat, Mar 10, 2012 at 11:50 PM, Harsh J <ha...@cloudera.com> wrote: >>> >>> Hans, >>> >>> You can change memory requirements for tasks of a single job, but not >>> of a single task inside that job. >>> >>> This is briefly how the 0.20 framework (by default) works: TT has >>> notions only of "slots", and carries a maximum _number_ of >>> simultaneous slots it may run. It does not know of what each task, >>> occupying one slot, would demand in resource-terms. Your job then >>> supplies a # of map tasks, and amount of memory required per map task >>> in general, as a configuration. TTs then merely start the task JVMs >>> with the provided heap configuration. >>> >>> On Sun, Mar 11, 2012 at 11:24 AM, Hans Uhlig <huh...@uhlisys.com> wrote: >>> > That was a typo in my email not in the configuration. Is the memory >>> > reserved >>> > for the tasks when the task tracker starts? You seem to be suggesting >>> > that I >>> > need to set the memory to be the same for all map tasks. Is there no way >>> > to >>> > override for a single map task? >>> > >>> > >>> > On Sat, Mar 10, 2012 at 8:41 PM, Harsh J <ha...@cloudera.com> wrote: >>> >> >>> >> Hans, >>> >> >>> >> Its possible you may have an typo issue: mapred.map.child.jvm.opts - >>> >> Such a property does not exist. Perhaps you wanted >>> >> "mapred.map.child.java.opts"? >>> >> >>> >> Additionally, the computation you need to do is (# of map slots on a >>> >> TT * per-map-task-heap-requirement) should be at least < (Total RAM - >>> >> 2/3 GB). With your 4 GB requirement, I guess you can support a max of >>> >> 6-7 slots per machine (i.e. Not counting reducer heap requirements in >>> >> parallel). >>> >> >>> >> On Sun, Mar 11, 2012 at 9:30 AM, Hans Uhlig <huh...@uhlisys.com> wrote: >>> >> > I am attempting to speed up a mapping process whose input is GZIP >>> >> > compressed >>> >> > CSV files. The files range from 1-2GB, I am running on a Cluster >>> >> > where >>> >> > each >>> >> > node has a total of 32GB memory available to use. I have attempted to >>> >> > tweak >>> >> > mapred.map.child.jvm.opts with -Xmx4096mb and io.sort.mb to 2048 to >>> >> > accommodate the size but I keep getting java heap errors or other >>> >> > memory >>> >> > related problems. My row count per mapper is well below >>> >> > Integer.MAX_INTEGER >>> >> > limit by several orders of magnitude and the box is NOT using >>> >> > anywhere >>> >> > close >>> >> > to its full memory allotment. How can I specify that this map task >>> >> > can >>> >> > have >>> >> > 3-4 GB of memory for the collection, partition and sort process >>> >> > without >>> >> > constantly spilling records to disk? >>> >> >>> >> >>> >> >>> >> -- >>> >> Harsh J >>> > >>> > >>> >>> >>> >>> -- >>> Harsh J >> >> > > > > -- > Harsh J -- Harsh J